Ziti ha cluster - sdk enabled apps falling off

Hiya

I have a 3-node HA controller cluster, with 3 edge routers
This works brilliantly when working with tunnelers (ZDE etc)

When using SDK-enabled/aware apps that host a service, they eventually fall off.

For testing, if I run 1 controller (in HA), the app works fine.
The issue only occurs when I introduce two or more controllers into the cluster.

The app complains about connecting to the edge routers; however, between the two tests (one HA controller vs. two or more HA controllers), nothing changes regarding the edge routers.

I've tested running my "01" controller in an HA configuration as a single controller, as well as "02", so I know base network communication between controllers and edge routers is ok.

The controllers are all within the same LAN/VLAN, with no host-level firewalls, so communication between them is direct and unrestricted.

I've tried using binaries 1.5.4 as well as 1.6.5, with no material difference.
Currently using go-sdk 1.2.0, but did also try a previous 1.1.x version

INFO[0000] new service session                           session token=642a526e-fb52-xxxx-xxxx-d201755e8457


ERRO[0056] unable to unbind session for conn             connId=1 error="channel closed" sessionId=642a526e-fb52-xxxx-xxxx-d201755e8457
ERRO[0056] failed to close listener                      connId=1 error="channel closed" marker= serviceName=web-service
ERRO[0059] unable to unbind session for conn             connId=1 error="timeout waiting for message to be written to wire: context deadline exceeded" sessionId=642a526e-fb52-xxxx-xxxx-d201755e8457
ERRO[0059] failed to close listener                      connId=1 error="timeout waiting for message to be written to wire: context deadline exceeded" marker= serviceName=web-service
ERRO[0059] failed to bind                                _context="ch{ziti-sdk[router=tls:01.edge.domain:3022]}->u{classic}->i{m6AYZQI4SJ/QjJQ}" connId=1 error="timeout waiting for message reply: context deadline exceeded" serviceName=web-service sessionId=cmdi56gwj019wscn3orgevjsx
ERRO[0059] failed to establish listener                  connId=1 error="timeout waiting for message reply: context deadline exceeded" router=01.edge.domain serviceId=qp9K4gnL2rrewFIUMAEgw serviceName=web-service
ERRO[0059] creating listener failed after 5001ms: timeout waiting for message reply: context deadline exceeded  router=01.edge.domain serviceName=web-service
INFO[0059] notify error handler of error: timeout waiting for message reply: context deadline exceeded 

I want to emphasise that the edge router configuration remains unchanged between tests. Only when multiple controllers are in play do the issues occur. And it only affects SDK-based apps, tunnelling identities with ZDE or similar works flawlessly in both cases.

I'd be happy to share configs and run any other diagnostics with developers if it's deemed useful. I just want to do that in a slightly less public fashion due to commercials.

Or, what glaringly obvious thing am I overlooking? =D

tia <3

Hi @strongwazz

The Go SDK still requires that the Enable HA flag is set, see: sdk-golang/ziti/config.go at v1.2.0 · openziti/sdk-golang · GitHub

We can hopefully soon make that flag obsolete. Let us know if that doesn't fix things and/or if you already had that flag set :slight_smile:

Paul

Haha, such a wonderful oversight on my part. Thanks, ill give that a crack

Nailed it :slight_smile: Didnt realise it was a flag that needed setting

Excellent :slight_smile:

I added an issue to make sure we don’t lose track of removing the need for the EnableHA flag: Remove need to EnableHA flag in Go SDK · Issue #779 · openziti/sdk-golang · GitHub