Ziti-edge-tunnel requires periodic manual restarts

I installed ziti-edge-tunnel in my Kubernetes cluster with helm (OpenZiti Helm Charts | helm-charts) and found that it needs to be restarted periodically because it loses connection to the controller:

 WARN ziti-sdk:ziti_ctrl.c:180 ctrl_resp_cb() ctrl[some.domain.com:1280[] request failed: -110(connection timed out)
ERROR ziti-sdk:ziti.c:1290 edge_routers_cb() ztx[0[] failed to get current edge routers: code[0[] CONTROLLER_UNAVAILABLE/connection timed out
 WARN ziti-sdk:ziti_ctrl.c:180 ctrl_resp_cb() ctrl[some.domain.com:1280[] request failed: -110(connection timed out)
 INFO ziti-sdk:ziti_ctrl.c:183 ctrl_resp_cb() ctrl[some.domain.com:1280[] attempting to switch endpoint
 WARN ziti-sdk:ziti_ctrl.c:566 ctrl_next_ep() ctrl[some.domain.com:1280[] no controllers are online
 WARN ziti-sdk:ziti.c:1238 check_service_update() ztx[0[] failed to poll service updates: code[0[] err[-16/connection timed out]

Moreover, if you access the console of the openziti-ziti-edge-tunnel-29qz5 container and install netcat, you can see that both the controller and the router, as well as the service it needs to connect to, are reachable for it:

[root@projects /]# nc -vz some.domain.com 24
Ncat: Version 7.92 ( https://nmap.org/ncat )
Ncat: Connected to 123.123.123.123:24.
Ncat: 0 bytes sent, 0 bytes received in 0.26 seconds.
[root@projects /]# nc -vz some.domain.com 1280
Ncat: Version 7.92 ( https://nmap.org/ncat )
Ncat: Connected to 123.123.123.123:1280.
Ncat: 0 bytes sent, 0 bytes received in 0.07 seconds.
[root@projects /]# nc -vz postgresql.some-project.svc.cluster.local 5432
Ncat: Version 7.92 ( https://nmap.org/ncat )
Ncat: Connected to 10.43.213.218:5432.
Ncat: 0 bytes sent, 0 bytes received in 0.03 seconds.

After a manual restart, it starts working normally for several days, then the situation repeats.

ziti-edge-tunnel version = v1.2.5

This happens not only in the Kubernetes cluster but also with ziti-edge-tunnel installed without Docker on hosts. As the number of installations grows, I increasingly have to ssh some.host.com -- killall -9 ziti-edge-tunnel in case of connection issues with services. I am looking for a way to automate this, to somehow check if the connection is lost and forcibly kill it in that case.

For WireGuard, I have set up a cron job:

*/1 * * * * (/usr/bin/ping -q -c1 -W4 -I opcode 10.1.0.1 > /dev/null || (/usr/sbin/ifdown opcode; /usr/sbin/ifup opcode)) 2>&1 | logger -t cron

I haven't yet come up with an equivalent for Ziti.

That's really unexpected. Would you be willing to send us the logs from a tunneler that has this situation? Probably at debug level for starters?

You sholdn't have to automate this -- we should find/fix this issue somehow... We appreciate you working with us to figure this out. Sounds like it might take a bit of work.

As for an automated check, if it were me I would probably put up some kind of relay server/http server and just issue a curl/port check. you can't use ping, ping won't work with OpenZiti, but you could stand up a netcat server and make sure you can connect to it or make a simple http request...

please update ziti-edge-tunnel to 1.2.9 or whatever the latest is. there were some connection handling improvements and fixes

Are you following these instructions to install ziti-edge-tunnel as a privileged daemonset for the worker nodes?

If so, you can override the tunneler version with Helm input values like this.

image:
    tag: 1.2.9