Hey @TheLumberjack thanks for the warm welcome!
Yeah, that sounds very much like the service-stopping-problem we experience. Having a pre-release to rollout to some users and test if it works would be great I think.
The bigger problems here are the ones related with starting after shutdown or after sleep.
Unfortunately the problems are on our ziti admin sided clients not reproducably. But we have two bigger customers each with a kind a big number of users, which seem to have these problems on a daily basis. Some said even a few times a day, we're still gathering precise data of that "silent crashes".
It seems like every identity acts random. We can provice you a identity with a generic test service to a website or something like that but I would bet that won't show the same behaviour like on our customers devices. (Because we ziti admins can't reproduce it either)
It's working as it usually should but then the problems start with this error:
[2024-06-26T05:12:40.924Z] INFO tunnel-cbs:ziti_dns.c:509 format_resp() found record[100.64.0.12] for query[1:xxxxxxxxxx.windows.net]
[2024-06-26T05:12:42.917Z] ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:49763 err=-14, terminating connection
After that we have a flood of error messages which don't end and just get corrected by (forcefully) restarting the ziti services.
[2024-06-26T05:12:42.917Z] ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:49763 err=-14, terminating connection
[2024-06-26T05:12:52.121Z] ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:49799 err=-14, terminating connection
[2024-06-26T05:15:18.799Z] INFO tunnel-cbs:ziti_dns.c:509 format_resp() found record[100.64.0.18] for query[1:xxx.xxxx.de]
[2024-06-26T05:15:49.100Z] ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:49740 err=-14, terminating connection
[2024-06-26T05:15:49.100Z] ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:49732 err=-14, terminating connection
[2024-06-26T05:16:49.111Z] ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:50102 err=-14, terminating connection
[2024-06-26T05:16:49.111Z] ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:50099 err=-14, terminating connection
[2024-06-26T05:16:49.111Z] ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:50094 err=-14, terminating connection
[2024-06-26T05:17:02.628Z] INFO tunnel-cbs:ziti_dns.c:509 format_resp() found record[100.64.0.18] for query[1:xxxx.xxxx.de]
[2024-06-26T05:17:19.106Z] ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:49741 err=-14, terminating connection
[2024-06-26T05:17:19.106Z] ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:50131 err=-14, terminating connection
[2024-06-26T05:17:43.795Z] ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:50110 err=-14, terminating connection
[2024-06-26T05:17:43.795Z] ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:50108 err=-14, terminating connection
These are the reoccuring error messages we get until we restart it:
[2024-06-26T06:43:04.714Z] ERROR ziti-sdk:ziti_ctrl.c:164 ctrl_resp_cb() ctrl[ctrl.ctrl.de] request failed: -4079(software caused connection abort)
[2024-06-26T06:43:04.714Z] ERROR ziti-sdk:ziti.c:1318 edge_routers_cb() ztx[0] failed to get current edge routers: code[0] CONTROLLER_UNAVAILABLE/software caused connection abort
[2024-06-26T06:43:04.714Z] ERROR ziti-sdk:ziti_ctrl.c:164 ctrl_resp_cb() ctrl[ctrl.ctrl.de] request failed: -4079(software caused connection abort)
[2024-06-26T06:43:04.714Z] WARN ziti-sdk:ziti.c:1260 check_service_update() ztx[0] failed to poll service updates: code[0] err[-16/software caused connection abort]
The most locations we're at have good and stable internet connections, some are unstable or are underpowered, like 40mb/s for 40 users. But we got that problem on every site.
We already gave the task out to gather the feedback files of devices, which just had that problems. We also can provide logs of controllers, routers etc.
As we can't reproduce the problems either with having identitys of our customers networks we don't think that would help much. We're already working on gathering and censoring our customers log files, we'll provide them via email as fast as we can.
Thank you very much!