Ziti Client crashes at SRV record lookup

Hi All,

we are currently in a mass rollout of ziti where Windows devices do join the Domain or are already joined. We had similar stuff in the past but this time the whole app crashes. Even the logs do stop in the middle of writing as soon a SRV lookup is made.

[2024-04-11T13:04:29.088Z]   DEBUG tunnel-sdk:tunnel_udp.c:259 recv_udp() intercepted address[udp:100.64.0.2:53] client[udp:100.64.0.1:58072] service[ziti:dns-resolver]
[2024-04-11T13:04:29.088Z]   DEBUG tunnel-cbs:ziti_dns.c:238 on_dns_client() new DNS client
[2024-04-11T13:04:29.088Z]   DEBUG tunnel-sdk:ziti_tunnel.c:221 ziti_tunneler_dial_completed() ziti dial succeeded: client[udp:100.64.0.1:58072] service[ziti:dns-resolver]
[2024-04-11T13:04:29.088Z]    INFO tunnel-cbs:ziti_dns.c:641 proxy_domain_req() writing proxy resolve [{
	"status":0,
	"id":53574,
	"recursive":0,
	"question":[{
		"name":"_ldap._tcp.client.local",
		"type":33
		}]
	}]
[2024-04-11T13:04:29.088Z]    INFO tunnel-cbs:ziti_dns.c:627 on_proxy_write() proxy resolve write: 106
[2024-04-11T13:04:29.125Z]    INFO tunnel-cbs:ziti_dns.c:600 on_proxy_data() proxy resolve: {"id":53574,"status":0,"question":[{"name":"_ldap._tcp.client.local","type":33}],"answer":[{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzNEUADS1VP.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzbreads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzhelads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzberads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzinfads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyztrvads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzgopads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzHELADS1VP.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzgroads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzHERADS1VP.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzpreads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzberads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzandads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzherads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzGOPADS1VP.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzneuads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzpirads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzsdtads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzhanads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzdelads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzstaads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"azrdc01p.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzHANADS1VP.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzknaads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzGROADS1VP.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzinfads2vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzleuads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzstvads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"xyzstvads1vp.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.client.local","type":33,"ttl":86400,"data":"azrdc02p.client.local.","port":389,"weight":100}]}
[2024-04-11T13:04:29.125Z]    INFO tunnel-cbs:ziti_dns.c:500 format_resp() found record[xyzNEUADS1VP.client.local.] for query[33:_ldap._tcp.client.local]
[2024-04-11T13:04:29.125Z]    INFO tunnel-cbs:ziti_dns.c:500 format_resp() found record[xyzbreads1vp.client.local.] for query[33:_ldap._tcp.client.local]
[2024-04-11T13:04:29.125Z]    INFO tunnel-cbs:ziti_dns.c:500 format_resp() found record[xyzhelads1vp.client.local.] for query[33:_ldap._tcp.client.local]
[2024-04-11T13:04:29.125Z]    INFO tunnel-cbs:ziti_dns.c:500 format_resp() found record[xyzberads1vp.client.local.] for query[33:_ldap._tcp.client.local]
[2024-04-11T13:04:29.125Z]    INFO tunnel-cbs:ziti_dns.c:500 format_resp() found record[xyzinfads1vp.client.local.] for query[33:_ldap._tcp.client.local]
[2024-04-11T13:04:29.125Z]    INFO tunnel-cbs:ziti_dns.c:500 format_resp() found record[xyztrvads1vp.client.local.] for query[33:_ldap._tcp.client.local]
[2024-04-11T13:04:29.125Z]    INFO tunnel-cbs:ziti_dns.c:500 format_r 

Does someone know where to begin with, maybe downgrading the ziti client?

Best Regards,
Frederik

Even the logs do stop in the middle of writing as soon a SRV lookup is made.

Interesting. When this happens, could you look at the CPU usage of this process and perhaps memory usage?

You are able to reliably trigger this problem with whatever configuration you have setup, would you be willing to give us an identity to replicate the problem with?

Yes I can trigger it with doing the lookup to the SRV record via PowerShell, as soon I hit enter the log is created and the ziti-edge-tunnel.exe is crashing.

From a CPU and RAM standpoint, there is near to no change by or in front of crashing.

What you are seeing here is a feature of ziti-edge-tunnel tries to answer MX, TXT, and SRV queries for any of the wildcard domains that it is intercepting. The intercepting tunneler can't answer these query types directly, so it sends them over to the hosting tunneler to see if it has an answer.

We've fixed a number of issues with this DNS query proxying in recent releases. Which version of Ziti Desktop Edge for Windows are you using (I'm guessing 2.1.16, but just to be sure). If so, would you be able to run a beta of our upcoming 2.2.2 release?

yes 2.1.16 and ported back till 2.1.14.

I will try 2.2.2 next.

Thanks

So the good news: no crashing Desktop app anymore.

But now, no SRV can be looked up at all:

[2024-04-11T13:45:47.455Z] DEBUG ziti-sdk:channel.c:246 ziti_channel_add_receiver() ch[1] added receiver[230]
[2024-04-11T13:45:47.455Z] DEBUG tunnel-cbs:ziti_dns.c:663 proxy_domain_req() writing proxy resolve req[8c65]: {"status":0,"id":35941,"recursive":1,"question":[{"name":"_ldap._tcp.AZR._sites.dc._msdcs.client.local","type":33}]}
[2024-04-11T13:45:47.475Z] INFO tunnel-cbs:ziti_dns.c:593 on_proxy_connect() proxy resolve connection established for domain[*.client.local]
[2024-04-11T13:45:47.475Z] DEBUG tunnel-cbs:ziti_dns.c:636 on_proxy_write() proxy resolve write: 108
[2024-04-11T13:45:47.475Z] WARN tunnel-cbs:ziti_dns.c:640 on_proxy_write() proxy resolve write failed: unexpected error/108
[2024-04-11T13:45:47.475Z] DEBUG tunnel-sdk:ziti_tunnel.c:435 ziti_tunneler_close() closing connection: client[udp:100.64.0.1:56073] service[ziti:dns-resolver]
[2024-04-11T13:45:47.475Z] DEBUG tunnel-sdk:tunnel_udp.c:127 tunneler_udp_close() closing src[udp:100.64.0.1:56073] dst[udp:100.64.0.2:53] service[ziti:dns-resolver]

Hmm. The problem you're seeing now should have been fixed in ziti-edge-tunnel 0.22.26, and 0.22.27 is what should be included with the desktop windows installer. Now I wonder where you got 2.2.2 from? Here's the link I should have shared with you before: https://github.com/openziti/desktop-edge-win-beta/releases/download/2.2.1.7/Ziti.Desktop.Edge.Client-2.2.2.exe

I've got it from Github as Pre-Release, it's labeled as 2.2.2.

Now, as I downloaded your link and installed it is crashing again using the SRV Record, which means every 13 seconds it's restarting/crashing. 2.2.2.0 Service: v0.22.27@

so same outcome with your beta, but different with the beta 2.2.2 from github release page.

The problem you do describe is in App: 2.2.2.0 Service v0.22.25@ which is the current 2.2.2 Pre-release at github.

So you are right 0.22.25 does have this error but 0.22.27 still crashes by taking an SRV Lookup.

so the logs look like a lot different with the beta release:

[2024-04-11T15:13:52.998Z] DEBUG tunnel-cbs:ziti_dns.c:643 on_proxy_write() proxy resolve write: 98
[2024-04-11T15:13:53.036Z] DEBUG tunnel-cbs:ziti_dns.c:606 on_proxy_data() proxy resolve: {"id":58536,"status":0,"question":[{"name":"_ldap._tcp.AZR._sites.client.local","type":33}],"answer":[{"name":"_ldap._tcp.AZR._sites.client.local","type":33,"ttl":86400,"data":"azrdc01p.client.local.","port":389,"weight":100},{"name":"_ldap._tcp.AZR._sites.client.local","type":33,"ttl":86400,"data":"azrdc02p.client.local.","port":389,"weight":100}]}
[2024-04-11T15:13:53.036Z] INFO tunnel-cbs:ziti_dns.c:500 format_resp() found record[azrdc01p.client.local.] for query[33:_ldap._tcp.AZR._sites.client.local]
[2024-04-11T15:13:53.036Z] INFO tunnel-cbs:ziti_dns.c:500 format_resp() found record[azrdc02p.client.local.] for query[33:_ldap._tcp.AZR._sites.client.local]
[2024-04-11T15:13:53.036Z] DEBUG tunnel-sdk:ziti_tunnel.c:434 ziti_tunneler_close() closing connection: client[udp:100.64.0.1:53831] service[ziti:dns-resolver]
[2024-04-11T15:13:53.036Z] DEBUG tunnel-sdk:tunnel_udp.c:126 tunneler_udp_close() closing src[udp:100.64.0.1:53831] dst[udp:100.64.0.2:53] service[ziti:dns-resolver]
[2024-04-11T15:13:53.055Z] DEBUG tunnel-sdk:tunnel_tcp.c:116 new_tcp_pcb() snd_wnd: 65535, snd_snd_max: 65535, mss: 32768
[2024-04-11T15:13:53.055Z] DEBUG tunnel-sdk:tunnel_tcp.c:429 recv_tcp() intercepted address[tcp:100.64.0.4:80] client[tcp:100.64.0.1:52360] service[ad]
[2024-04-11T15:13:53.055Z] DEBUG tunnel-cbs:ziti_tunnel_cbs.c:349 ziti_sdk_c_dial() service[ad] app_data_json[165]='{"connType":null,"dst_protocol":"tcp","dst_hostname":"wpad.client.local","dst_ip":"100.64.0.4","dst_port":"80","src_protocol":"tcp","src_ip":"100.64.0.1","src_port":"52360"}'
[2024-04-11T15:13:53.055Z] DEBUG ziti-sdk:connect.c:413 connect_get_service_cb() conn[0.15/nmSVqJp0/Connecting] got service[ad] id[57mDgplKb78EK2GIdgb9ir]
[2024-04-11T15:13:53.055Z] DEBUG ziti-sdk:posture.c:210 ziti_send_posture_data() ztx[0] posture checks must_send set to TRUE, new_session_id[FALSE], must_send_every_time[TRUE], new_controller_instance[FALSE]
[2024-04-11T15:13:53.055Z] DEBUG ziti-sdk:connect.c:534 process_connect() conn[0.15/nmSVqJp0/Connecting] starting Dial connection for service[ad] with session[cluvdqbqr11oi0ao0unvecljz]
[2024-04-11T15:13:53.055Z] DEBUG ziti-sdk:connect.c:390 ziti_connect() conn[0.15/nmSVqJp0/Connecting] selected ch[admin-pub-rt@tls://pubrt1.xyz.canme.network:443] for best latency(17 ms)
[2024-04-11T15:13:53.055Z] DEBUG ziti-sdk:channel.c:246 ziti_channel_add_receiver() ch[1] added receiver[15]
[2024-04-11T15:13:53.089Z] ERROR ziti-sdk:connect.c:963 connect_reply_cb() conn[0.15/nmSVqJp0/Connecting] failed to connect, reason=exceeded maximum [2] retries creating circuit [c/PH5cNzNd6]: error creating route for [s/PH5cNzNd6] on [r/Ah-MAaqQcY] (error creating route for [c/PH5cNzNd6]: dial tcp: lookup wpad.client.local on 10.2.34.69:53: no such host)
[2024-04-11T15:13:53.089Z] ERROR tunnel-cbs:ziti_tunnel_cbs.c:103 on_ziti_connect() ziti dial failed: connection is closed
[2024-04-11T15:13:53.089Z] DEBUG ziti-sdk:connect.c:172 close_conn_internal() conn[0.15/nmSVqJp0/Closed] removing
[2024-04-11T15:13:53.089Z] DEBUG tunnel-sdk:ziti_tunnel.c:434 ziti_tunneler_close() closing connection: client[tcp:100.64.0.1:52360] service[ad]
[2024-04-11T15:13:53.089Z] DEBUG tunnel-sdk:tunnel_tcp.c:251 tunneler_tcp_close() closing src dst state[3/SYN_RCVD] flags[0x100] service
[2024-04-11T15:13:53.089Z] DEBUG tunnel-sdk:tunnel_tcp.c:258 tunneler_tcp_close() closing connection before handshake complete. sending RST to client
[2024-04-11T15:13:53.089Z] DEBUG ziti-sdk:channel.c:253 ziti_channel_rem_receiver() ch[1] removed receiver[15]
[2024-04-11T15:13:53.089Z] DEBUG ziti-sdk:ziti.c:1749 grim_reaper() ztx[0] reaped 1 closed (out of 4 total) connections
[2024-04-11T15:13:53.104Z] DEBUG tunnel-sdk:tunnel_tcp.c:116 new_tcp_pcb() snd_wnd: 65535, snd_snd_max: 65535, mss: 32768
[2024-04-11T15:13:53.104Z] DEBUG tunnel-sdk:tunnel_tcp.c:429 recv_tcp() intercepted address[tcp:100.64.0.6:389] client[tcp:100.64.0.1:52365] service[ad]
[2024-04-11T15:13:53.104Z] DEBUG tunnel-cbs:ziti_tunnel_cbs.c:349 ziti_sdk_c_dial() service[ad] app_data_json[170]='{"connType":null,"dst_protocol":"tcp","dst_hostname":"azrdc02p.client.local","dst_ip":"100.64.0.6","dst_port":"389","src_protocol":"tcp","src_ip":"100.64.0.1","src_port":"52365"}'
[2024-04-11T15:13:53.104Z] DEBUG ziti-sdk:connect.c:413 connect_get_service_cb() conn[0.16/IyFrY0Zx/Connecting] got service[ad] id[57mDgplKb78EK2GIdgb9ir]
[2024-04-11T15:13:53.104Z] DEBUG ziti-sdk:posture.c:210 ziti_send_posture_data() ztx[0] posture checks must_send set to TRUE, new_session_id[FALSE], must_send_every_time[TRUE], new_controller_instance[FALSE]
[2024-04-11T15:13:53.104Z] DEBUG ziti-sdk:connect.c:534 process_connect() conn[0.16/IyFrY0Zx/Connecting] starting Dial connection for service[ad] with session[cluvdqbqr11oi0ao0unvecljz]
[2024-04-11T15:13:53.104Z] DEBUG ziti-sdk:connect.c:390 ziti_connect() conn[0.16/IyFrY0Zx/Connecting] selected ch[admin-pub-rt@tls://pubrt1.xyz.canme.network:443] for best latency(17 ms)
[2024-04-11T15:13:53.104Z] DEBUG ziti-sdk:channel.c:246 ziti_channel_add_receiver() ch[1] added receiver[16]
[2024-04-11T15:13:53.136Z] DEBUG tunnel-sdk:ziti_tunnel.c:221 ziti_tunneler_dial_completed() ziti dial succeeded: client[tcp:100.64.0.1:52365] service[ad]
[2024-04-11T15:13:53.137Z] DEBUG tunnel-sdk:tunnel_tcp.c:66 on_accept() on_accept: 0
[2024-04-11T15:13:53.149Z] DEBUG tunnel-sdk:tunnel_tcp.c:116 new_tcp_pcb() snd_wnd: 65535, snd_snd_max: 65535, mss: 32768
[2024-04-11T15:13:53.149Z] DEBUG tunnel-sdk:tunnel_tcp.c:429 recv_tcp() intercepted address[tcp:100.64.0.4:80] client[tcp:100.64.0.1:52362] service[ad]
[2024-04-11T15:13:53.149Z] DEBUG tunnel-cbs:ziti_tunnel_cbs.c:349 ziti_sdk_c_dial() service[ad] app_data_json[165]='{"connType":null,"dst_protocol":"tcp","dst_hostname":"wpad.client.local","dst_ip":"100.64.0.4","dst_port":"80","src_protocol":"tcp","src_ip":"100.64.0.1","src_port":"52362"}'
[2024-04-11T15:13:53.149Z] DEBUG ziti-sdk:connect.c:413 connect_get_service_cb() conn[0.17/bJn-8TLC/Connecting] got service[ad] id[57mDgplKb78EK2GIdgb9ir]
[2024-04-11T15:13:53.149Z] DEBUG ziti-sdk:posture.c:210 ziti_send_posture_data() ztx[0] posture checks must_send set to TRUE, new_session_id[FALSE], must_send_every_time[TRUE], new_controller_instance[FALSE]
[2024-04-11T15:13:53.149Z] DEBUG ziti-sdk:connect.c:534 process_connect() conn[0.17/bJn-8TLC/Connecting] starting Dial connection for service[ad] with session[cluvdqbqr11oi0ao0unvecljz]
[2024-04-11T15:13:53.149Z] DEBUG ziti-sdk:connect.c:390 ziti_connect() conn[0.17/bJn-8TLC/Connecting] selected ch[admin-pub-rt@tls://pubrt1.xyz.canme.network:443] for best latency(17 ms)
[2024-04-11T15:13:53.149Z] DEBUG ziti-sdk:channel.c:246 ziti_channel_add_receiver() ch[1] added receiver[17]
[2024-04-11T15:13:53.151Z] DEBUG tunnel-sdk:tunnel_udp.c:269 recv_udp() intercepted address[udp:100.64.0.2:53] client[udp:100.64.0.1:53720] service[ziti:dns-resolver]
[2024-04-11T15:13:53.151Z] DEBUG tunnel-cbs:ziti_dns.c:234 on_dns_client() new DNS client
[2024-04-11T15:13:53.151Z] DEBUG tunnel-sdk:ziti_tunnel.c:221 ziti_tunneler_dial_completed() ziti dial succeeded: client[udp:100.64.0.1:53720] service[ziti:dns-resolver]
[2024-04-11T15:13:53.151Z] DEBUG tunnel-cbs:ziti_dns.c:673 proxy_domain_req() writing proxy resolve req[420b]: {"status":0,"id":16907,"recursive":1,"question":[{"name":"_ldap._tcp.client.local","type":33}]}
[2024-04-11T15:13:53.152Z] DEBUG tunnel-cbs:ziti_dns.c:643 on_proxy_write() proxy resolve write: 87

Okay, thanks for the log excerpt. So it looks like the SRV queries are working for a bit - at least I see one successful response at the beginning of the excerpt. Then the next one doesn't complete, and that's where the process crashes?

I'm setting up an environment to test this locally.

Yes, I do see it also successful at the private router in the DC, but in the answer it just breaks after a bit, the same as before.

Would you be interested to see it live via screen share?

Thanks

I'm not yet sure if this is related to the crash you're seeing (and even if it is it's obviously a problem that needs to be fixed), but this error suggests to me that your service has forwardAddress enabled, which means that hosting tunneler will use the intercepted address - whatever it is - (wpad.client.local in this case) to connect to the application server, and in this case it looks like wpad.client.local does not resolve on the hosting tunneler.

Would you expect this hostname to work at the hosting tunneler?

Thanks for this offer. Feel free to DM a link to me when you're able.

1 Like

The crash is due to the dns response formatter not being prepared for large responses. This has been fixed in ziti-edge-tunnel 0.22.28, which is also included in the upcoming 2.2.x release of Ziti Desktop Edge for Windows.

1 Like

Just as a wrapper to the end.

There were too many answers back from an SRV lookup, which crashed the Tunnel-SDK and restarted several times. In this case there are really a lot of answers back.

Thanks to @scareything we were able to fix this in less than 2 Days. Our people and the customer were blown away by this.

So far everything went smooth and the rollout is working well, so thanks again shawn!

3 Likes