Ziti Desktop Edge crashes on Win11 Startup/Sleep

Hey there,
we're having huge problems with the ziti desktop edge client since the newest update:

  1. After shutdown and power on (not reboot) the ziti client seems silently to crash - as the tunnel and routes are there but not working
  2. If we want to restart the zdew client by hitting the green checkmark the "stopping data service" takes forever and is not doing anything
  3. the only solution is a real "reboot" not a shutdown and power on, as this causes the same problem.
  4. The Logs does look a little wierd as it states that the controller is not available, then it is, after that it writes to cloesed connections and seems to be in a spiral, but we cant confirm that really.
  5. We had the win11 function "hiberboot" deactivated, but there are still problems - it does occur only with the newest stable version

We're happy to discuss the problems and share logs in a private session.

Hi @Misc welcome to the community and to OpenZiti!

I'm sorry to hear you're having issues with the Ziti Desktop Edge for Windows (ZDEW). This is a problem I have not personally experienced, but there has recently been some other users that have problems restarting the ZDEW. I just pushed out a PR that I think contains a "fix" for stopping/starting the service issue. If you like, I can make a build available that you could try out to at least confirm the stop/start issue is resolved.

This is consistent with the issue reported by one other user and is consistent with what I've observed from the user's logs I looked at. Out of curiosity, before upgrading, can you attempt to follow the instructions laid out in that issue as a workaround? I suspect they won't work, but it'd be great if you could confirm. When this situation happens, instead of the big green button can you run these commands as admin?

net stop ziti-monitor
net stop ziti
net start ziti-monitor
net start ziti

I expect the net stop ziti command will fail -- but I can't replicate this issue to test it. If that fails, can you terminate the ziti-edge-tunnel process either through Task Manager or whatever mechanism you like? For example


Terminating ziti-edge-tunnel.exe Using PowerShell

Get-Process ziti-edge-tunnel | Stop-Process -Force

Terminating ziti-edge-tunnel.exe Using CMD.exe

taskkill /F /IM ziti-edge-tunnel.exe


While analyzing that particular issue with @scareything, he started to wonder if this was similar to another issue people have experienced which we still can't quite track down. I have "numerous" questions that would help us possibly...

  • How reproducible is this particular issue for you?
  • Does it happen consistently or intermittently (it's obviously often enough that you had to reach out, again apologies for that)?
  • Would you be willing to give me an identity that would trigger this behavior?
  • Can you describe 'what' the identity is doing? How many services does it have access to, is it hosting any services, are you using wildcard intercepts, anything that might be relevant so that we can figure out what situation causes this issue
  • The CONTROLLER_UNAVAILABLE issue is often (rightly or wrongly) associated with "that's just networks being networking" (sometimes things are not routable), are you in a location that has particularly good/stable -- or particularly bad internet

sorry for so many questions. The best thing for us is an identity that emulates the problem since we still can't quite trigger it ourselves.

I don't know if logs would help, but they surely don't hurt. It's best to have the level at least at DEBUG (if not VERBOSE) and for the ZDEW it's best to collect a "feedback" zip file. Can you send that to me at clint at openziti.org?

Thanks

Hey @TheLumberjack thanks for the warm welcome!

Yeah, that sounds very much like the service-stopping-problem we experience. Having a pre-release to rollout to some users and test if it works would be great I think.
The bigger problems here are the ones related with starting after shutdown or after sleep.

Unfortunately the problems are on our ziti admin sided clients not reproducably. But we have two bigger customers each with a kind a big number of users, which seem to have these problems on a daily basis. Some said even a few times a day, we're still gathering precise data of that "silent crashes".

It seems like every identity acts random. We can provice you a identity with a generic test service to a website or something like that but I would bet that won't show the same behaviour like on our customers devices. (Because we ziti admins can't reproduce it either)

It's working as it usually should but then the problems start with this error:

 [2024-06-26T05:12:40.924Z]    INFO tunnel-cbs:ziti_dns.c:509 format_resp() found record[100.64.0.12] for query[1:xxxxxxxxxx.windows.net]
 [2024-06-26T05:12:42.917Z]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:49763 err=-14, terminating connection

After that we have a flood of error messages which don't end and just get corrected by (forcefully) restarting the ziti services.

[2024-06-26T05:12:42.917Z]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:49763 err=-14, terminating connection
[2024-06-26T05:12:52.121Z]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:49799 err=-14, terminating connection
[2024-06-26T05:15:18.799Z]    INFO tunnel-cbs:ziti_dns.c:509 format_resp() found record[100.64.0.18] for query[1:xxx.xxxx.de]
[2024-06-26T05:15:49.100Z]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:49740 err=-14, terminating connection
[2024-06-26T05:15:49.100Z]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:49732 err=-14, terminating connection
[2024-06-26T05:16:49.111Z]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:50102 err=-14, terminating connection
[2024-06-26T05:16:49.111Z]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:50099 err=-14, terminating connection
[2024-06-26T05:16:49.111Z]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:50094 err=-14, terminating connection
[2024-06-26T05:17:02.628Z]    INFO tunnel-cbs:ziti_dns.c:509 format_resp() found record[100.64.0.18] for query[1:xxxx.xxxx.de]
[2024-06-26T05:17:19.106Z]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:49741 err=-14, terminating connection
[2024-06-26T05:17:19.106Z]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:50131 err=-14, terminating connection
[2024-06-26T05:17:43.795Z]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:50110 err=-14, terminating connection
[2024-06-26T05:17:43.795Z]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:50108 err=-14, terminating connection

These are the reoccuring error messages we get until we restart it:

[2024-06-26T06:43:04.714Z]   ERROR ziti-sdk:ziti_ctrl.c:164 ctrl_resp_cb() ctrl[ctrl.ctrl.de] request failed: -4079(software caused connection abort)
[2024-06-26T06:43:04.714Z]   ERROR ziti-sdk:ziti.c:1318 edge_routers_cb() ztx[0] failed to get current edge routers: code[0] CONTROLLER_UNAVAILABLE/software caused connection abort
[2024-06-26T06:43:04.714Z]   ERROR ziti-sdk:ziti_ctrl.c:164 ctrl_resp_cb() ctrl[ctrl.ctrl.de] request failed: -4079(software caused connection abort)
[2024-06-26T06:43:04.714Z]    WARN ziti-sdk:ziti.c:1260 check_service_update() ztx[0] failed to poll service updates: code[0] err[-16/software caused connection abort]

The most locations we're at have good and stable internet connections, some are unstable or are underpowered, like 40mb/s for 40 users. But we got that problem on every site.

We already gave the task out to gather the feedback files of devices, which just had that problems. We also can provide logs of controllers, routers etc.

As we can't reproduce the problems either with having identitys of our customers networks we don't think that would help much. We're already working on gathering and censoring our customers log files, we'll provide them via email as fast as we can.

Thank you very much!