Hiya
I have a 3-node HA controller cluster, with 3 edge routers
This works brilliantly when working with tunnelers (ZDE etc)
When using SDK-enabled/aware apps that host a service, they eventually fall off.
For testing, if I run 1 controller (in HA), the app works fine.
The issue only occurs when I introduce two or more controllers into the cluster.
The app complains about connecting to the edge routers; however, between the two tests (one HA controller vs. two or more HA controllers), nothing changes regarding the edge routers.
I've tested running my "01" controller in an HA configuration as a single controller, as well as "02", so I know base network communication between controllers and edge routers is ok.
The controllers are all within the same LAN/VLAN, with no host-level firewalls, so communication between them is direct and unrestricted.
I've tried using binaries 1.5.4 as well as 1.6.5, with no material difference.
Currently using go-sdk 1.2.0, but did also try a previous 1.1.x version
INFO[0000] new service session session token=642a526e-fb52-xxxx-xxxx-d201755e8457
ERRO[0056] unable to unbind session for conn connId=1 error="channel closed" sessionId=642a526e-fb52-xxxx-xxxx-d201755e8457
ERRO[0056] failed to close listener connId=1 error="channel closed" marker= serviceName=web-service
ERRO[0059] unable to unbind session for conn connId=1 error="timeout waiting for message to be written to wire: context deadline exceeded" sessionId=642a526e-fb52-xxxx-xxxx-d201755e8457
ERRO[0059] failed to close listener connId=1 error="timeout waiting for message to be written to wire: context deadline exceeded" marker= serviceName=web-service
ERRO[0059] failed to bind _context="ch{ziti-sdk[router=tls:01.edge.domain:3022]}->u{classic}->i{m6AYZQI4SJ/QjJQ}" connId=1 error="timeout waiting for message reply: context deadline exceeded" serviceName=web-service sessionId=cmdi56gwj019wscn3orgevjsx
ERRO[0059] failed to establish listener connId=1 error="timeout waiting for message reply: context deadline exceeded" router=01.edge.domain serviceId=qp9K4gnL2rrewFIUMAEgw serviceName=web-service
ERRO[0059] creating listener failed after 5001ms: timeout waiting for message reply: context deadline exceeded router=01.edge.domain serviceName=web-service
INFO[0059] notify error handler of error: timeout waiting for message reply: context deadline exceeded
I want to emphasise that the edge router configuration remains unchanged between tests. Only when multiple controllers are in play do the issues occur. And it only affects SDK-based apps, tunnelling identities with ZDE or similar works flawlessly in both cases.
I'd be happy to share configs and run any other diagnostics with developers if it's deemed useful. I just want to do that in a slightly less public fashion due to commercials.
Or, what glaringly obvious thing am I overlooking? =D
tia <3