I've got an overlay network setup with a controller and public listening edge router setup on one host. Those services are internet-facing and exposed. I've also got an edge router that's inside a private network with no ports open to the internet. The private network is where the target services are hosted. Overall, this setup has been working for several months and I think I've got the components setup properly. The components are running on Linux, no containers.
I have one problematic MacBook (heavy user) that suddenly lost connectivity a couple days ago. Green lights in the tunneler, rebooted, reconnected, no improvement. The computer works when connected directly to the private network with Ziti enabled.
I have other types of devices and even other Macs that are working fine. This afternoon I got to look through trace logs on the problem computer. It looks like the tunneler tries to reach both edge routers. The exposed one is mentioned once. The private (and hence unreachable) edge router is repeatedly tried. I don't understand why the tunneler isn't trying to connect to the available edge router.
A working MacBook immediately connects to the publicly exposed router. I'll get some quality time with the broken computer tomorrow. What might cause this? What can I test to narrow it down?
I sanitized some log snippet after that on_ziti_event(). It looks like yes, the public router did connect.
(1824)[2024-11-08T18:14:57.855Z] INFO tunnel-cbs:ziti_tunnel_ctrl.c:1043 on_ziti_event() ztx[mac-host.name] router ourziti-edge-router connected
Then there's all the DNS intercept stuff. But after that it tries to connect to the private router and endlessly retries. The log goes on for another 18 attempts of this. I don't see where it actually tried to use the public router. It just pre-selected the other one for some reason.
I don't think there's anything wrong with the overlay configuration/policies. I'm using it right now via BrowZer to get into my email. Basically the same path - home PC to BrowZer to public router to private router to mail.
(1824)[2024-11-08T18:15:04.650Z] DEBUG ziti-sdk:channel.c:765 reconnect_cb() ch[0] connecting to tls://nisztday.private.net:3022
(1824)[2024-11-08T18:15:04.654Z] ERROR ziti-sdk:channel.c:943 on_tls_connect() ch[0] failed to connect to ER[nisztday] [-3008/unknown node or service]
The larger snippet I took those chunks from...
(1824)[2024-11-08T18:14:57.855Z] INFO tunnel-cbs:ziti_tunnel_ctrl.c:1043 on_ziti_event() ztx[mac-host.name] router ourziti-edge-router connected
(1824)[2024-11-08T18:14:57.855Z] TRACE ziti-sdk:channel.c:900 on_channel_data() ch[1] read no data
[2024-11-08T18:14:57:862Z] INFO PacketTunnelProvider:PacketTunnelProvider.swift:381 logNetworkPath() Network Path Update:
Status:satisfied, Expensive:false, Cellular:false, DNS:true
Interfaces:
15: name:en0, type:wifi
22: name:utun4, type:other
[2024-11-08T18:14:57:943Z] WARN PacketTunnelProvider:PacketTunnelProvider.swift:327 getUpstreamDns() No fallback DNS configured. Setting to first resolver: 192.168.0.1
[2024-11-08T18:14:57:943Z] INFO PacketTunnelProvider:PacketTunnelProvider.swift:366 startNetworkMonitor() Setting fallback DNS to 192.168.0.1
[2024-11-08T18:14:57:943Z] DEBUG CZiti:ZitiTunnel.swift:158 setUpstreamDns() upStreamDNS=192.168.0.1, port=53
(1824)[2024-11-08T18:14:57.942Z] INFO tunnel-cbs:ziti_dns.c:273 ziti_dns_set_upstream() DNS upstream[1] is set to 192.168.0.1:53
[2024-11-08T18:14:58:001Z] INFO PacketTunnelProvider:PacketTunnelProvider.swift:381 logNetworkPath() Network Path Update:
Status:satisfied, Expensive:false, Cellular:false, DNS:true
Interfaces:
15: name:en0, type:wifi
22: name:utun4, type:other
[2024-11-08T18:14:58:007Z] TRACE PacketTunnelProvider:PacketTunnelProvider.swift:287 readPacketFlow() read 1 packets
[2024-11-08T18:14:58:007Z] TRACE PacketTunnelProvider:PacketTunnelProvider.swift:287 readPacketFlow() read 1 packets
(1824)[2024-11-08T18:14:58.007Z] TRACE tunnel-sdk:tunnel_udp.c:156 recv_udp() received datagram src[100.64.0.1:63697] dst[100.64.0.2:53]
(1824)[2024-11-08T18:14:58.007Z] VERBOSE tunnel-sdk:intercept.c:130 port_match() matching port 53 to range 53
(1824)[2024-11-08T18:14:58.007Z] VERBOSE tunnel-sdk:intercept.c:133 port_match() port 53 matches range 53 with score 0
(1824)[2024-11-08T18:14:58.007Z] VERBOSE tunnel-sdk:intercept.c:135 port_match() port 53 is best match so far
(1824)[2024-11-08T18:14:58.007Z] DEBUG tunnel-sdk:tunnel_udp.c:231 recv_udp() intercepted address[udp:100.64.0.2:53] client[udp:100.64.0.1:63697] service[ziti:dns-resolver]
(1824)[2024-11-08T18:14:58.007Z] TRACE tunnel-cbs:ziti_dns.c:281 on_dns_client() new DNS client
(1824)[2024-11-08T18:14:58.007Z] DEBUG tunnel-sdk:ziti_tunnel.c:221 ziti_tunneler_dial_completed() ziti dial succeeded: client[udp:100.64.0.1:63697] service[ziti:dns-resolver]
(1824)[2024-11-08T18:14:58.007Z] VERBOSE tunnel-sdk:tunnel_udp.c:84 on_udp_client_data() 39 bytes from 100.64.0.1:63697
(1824)[2024-11-08T18:14:58.007Z] TRACE tunnel-sdk:tunnel_udp.c:54 to_ziti() writing 39 bytes to ziti src[udp:100.64.0.1:63697] dst[udp:100.64.0.2:53] service[ziti:dns-resolver]
(1824)[2024-11-08T18:14:58.007Z] TRACE tunnel-cbs:ziti_dns.c:819 on_dns_req() received DNS query q_len=39 id[efea] recursive[true] type[1] name[mail.ourdomain.com]
(1824)[2024-11-08T18:14:58.007Z] INFO tunnel-cbs:ziti_dns.c:567 format_resp() found record[100.64.0.3] for query[1:mail.ourdomain.com]
(1824)[2024-11-08T18:14:58.007Z] TRACE tunnel-sdk:netif_shim.c:34 netif_shim_output() writing packet UDP[100.64.0.2:53 -> 100.64.0.1:63697] len=94
(1824)[2024-11-08T18:14:58.007Z] TRACE tunnel-cbs:ziti_dns.c:299 on_dns_close() DNS client close
(1824)[2024-11-08T18:14:58.007Z] DEBUG tunnel-sdk:ziti_tunnel.c:435 ziti_tunneler_close() closing connection: client[udp:100.64.0.1:63697] service[ziti:dns-resolver]
(1824)[2024-11-08T18:14:58.007Z] DEBUG tunnel-sdk:tunnel_udp.c:104 tunneler_udp_close() closing src[udp:100.64.0.1:63697] dst[udp:100.64.0.2:53] service[ziti:dns-resolver]
(1824)[2024-11-08T18:14:58.007Z] TRACE tunnel-sdk:tunnel_udp.c:156 recv_udp() received datagram src[100.64.0.1:62357] dst[100.64.0.2:53]
(1824)[2024-11-08T18:14:58.007Z] DEBUG tunnel-sdk:tunnel_udp.c:231 recv_udp() intercepted address[udp:100.64.0.2:53] client[udp:100.64.0.1:62357] service[ziti:dns-resolver]
(1824)[2024-11-08T18:14:58.007Z] TRACE tunnel-cbs:ziti_dns.c:281 on_dns_client() new DNS client
(1824)[2024-11-08T18:14:58.007Z] DEBUG tunnel-sdk:ziti_tunnel.c:221 ziti_tunneler_dial_completed() ziti dial succeeded: client[udp:100.64.0.1:62357] service[ziti:dns-resolver]
(1824)[2024-11-08T18:14:58.007Z] VERBOSE tunnel-sdk:tunnel_udp.c:84 on_udp_client_data() 36 bytes from 100.64.0.1:62357
(1824)[2024-11-08T18:14:58.007Z] TRACE tunnel-sdk:tunnel_udp.c:54 to_ziti() writing 36 bytes to ziti src[udp:100.64.0.1:62357] dst[udp:100.64.0.2:53] service[ziti:dns-resolver]
(1824)[2024-11-08T18:14:58.007Z] TRACE tunnel-cbs:ziti_dns.c:819 on_dns_req() received DNS query q_len=36 id[5dab] recursive[true] type[64] name[_dns.resolver.arpa]
[2024-11-08T18:14:58:011Z] WARN PacketTunnelProvider:PacketTunnelProvider.swift:327 getUpstreamDns() No fallback DNS configured. Setting to first resolver: 192.168.0.1
[2024-11-08T18:14:58:011Z] INFO PacketTunnelProvider:PacketTunnelProvider.swift:366 startNetworkMonitor() Setting fallback DNS to 192.168.0.1
[2024-11-08T18:14:58:011Z] DEBUG CZiti:ZitiTunnel.swift:158 setUpstreamDns() upStreamDNS=192.168.0.1, port=53
(1824)[2024-11-08T18:14:58.011Z] INFO tunnel-cbs:ziti_dns.c:273 ziti_dns_set_upstream() DNS upstream[1] is set to 192.168.0.1:53
(1824)[2024-11-08T18:14:58.042Z] TRACE tunnel-cbs:ziti_dns.c:881 on_upstream_packet() upstream sent response to query[5dab] (rc=112)
(1824)[2024-11-08T18:14:58.042Z] TRACE tunnel-sdk:netif_shim.c:34 netif_shim_output() writing packet UDP[100.64.0.2:53 -> 100.64.0.1:62357] len=140
(1824)[2024-11-08T18:14:58.042Z] TRACE tunnel-cbs:ziti_dns.c:299 on_dns_close() DNS client close
(1824)[2024-11-08T18:14:58.042Z] DEBUG tunnel-sdk:ziti_tunnel.c:435 ziti_tunneler_close() closing connection: client[udp:100.64.0.1:62357] service[ziti:dns-resolver]
(1824)[2024-11-08T18:14:58.042Z] DEBUG tunnel-sdk:tunnel_udp.c:104 tunneler_udp_close() closing src[udp:100.64.0.1:62357] dst[udp:100.64.0.2:53] service[ziti:dns-resolver]
[2024-11-08T18:14:58:075Z] TRACE PacketTunnelProvider:PacketTunnelProvider.swift:287 readPacketFlow() read 1 packets
(1824)[2024-11-08T18:14:58.075Z] TRACE tunnel-sdk:tunnel_udp.c:156 recv_udp() received datagram src[100.64.0.1:61556] dst[100.64.0.2:53]
(1824)[2024-11-08T18:14:58.075Z] DEBUG tunnel-sdk:tunnel_udp.c:231 recv_udp() intercepted address[udp:100.64.0.2:53] client[udp:100.64.0.1:61556] service[ziti:dns-resolver]
(1824)[2024-11-08T18:14:58.075Z] TRACE tunnel-cbs:ziti_dns.c:281 on_dns_client() new DNS client
(1824)[2024-11-08T18:14:58.075Z] DEBUG tunnel-sdk:ziti_tunnel.c:221 ziti_tunneler_dial_completed() ziti dial succeeded: client[udp:100.64.0.1:61556] service[ziti:dns-resolver]
(1824)[2024-11-08T18:14:58.075Z] VERBOSE tunnel-sdk:tunnel_udp.c:84 on_udp_client_data() 39 bytes from 100.64.0.1:61556
(1824)[2024-11-08T18:14:58.075Z] TRACE tunnel-sdk:tunnel_udp.c:54 to_ziti() writing 39 bytes to ziti src[udp:100.64.0.1:61556] dst[udp:100.64.0.2:53] service[ziti:dns-resolver]
(1824)[2024-11-08T18:14:58.075Z] TRACE tunnel-cbs:ziti_dns.c:819 on_dns_req() received DNS query q_len=39 id[7c51] recursive[true] type[65] name[mail.ourdomain.com]
(1824)[2024-11-08T18:14:58.127Z] TRACE tunnel-cbs:ziti_dns.c:881 on_upstream_packet() upstream sent response to query[7c51] (rc=99)
(1824)[2024-11-08T18:14:58.127Z] TRACE tunnel-sdk:netif_shim.c:34 netif_shim_output() writing packet UDP[100.64.0.2:53 -> 100.64.0.1:61556] len=127
(1824)[2024-11-08T18:14:58.127Z] TRACE tunnel-cbs:ziti_dns.c:299 on_dns_close() DNS client close
(1824)[2024-11-08T18:14:58.127Z] DEBUG tunnel-sdk:ziti_tunnel.c:435 ziti_tunneler_close() closing connection: client[udp:100.64.0.1:61556] service[ziti:dns-resolver]
(1824)[2024-11-08T18:14:58.127Z] DEBUG tunnel-sdk:tunnel_udp.c:104 tunneler_udp_close() closing src[udp:100.64.0.1:61556] dst[udp:100.64.0.2:53] service[ziti:dns-resolver]
(1824)[2024-11-08T18:14:58.396Z] VERBOSE ziti-sdk:posture.c:196 ziti_send_posture_data() ztx[0] starting to send posture data
(1824)[2024-11-08T18:14:58.396Z] INFO ziti-sdk:posture.c:206 ziti_send_posture_data() ztx[0] first run or potential controller restart detected
(1824)[2024-11-08T18:14:58.396Z] DEBUG ziti-sdk:posture.c:213 ziti_send_posture_data() ztx[0] posture checks must_send set to TRUE, new_session_id[TRUE], must_send_every_time[TRUE], new_controller_instance[TRUE]
(1824)[2024-11-08T18:14:58.396Z] VERBOSE ziti-sdk:posture.c:238 ziti_send_posture_data() ztx[0] checking posture queries on 1 service(s)
(1824)[2024-11-08T18:14:58.396Z] VERBOSE ziti-sdk:posture.c:536 ziti_pr_send_bulk() ztx[0] no change in posture data, not sending
(1824)[2024-11-08T18:15:04.650Z] DEBUG ziti-sdk:channel.c:765 reconnect_cb() ch[0] connecting to tls://nisztday.private.net:3022
(1824)[2024-11-08T18:15:04.654Z] ERROR ziti-sdk:channel.c:943 on_tls_connect() ch[0] failed to connect to ER[nisztday] [-3008/unknown node or service]
From a user perspective there's no connectivity to the end application. As if Ziti were not running at all.
No logged errors about the service. The only errors are in reference to the private router failing to connect. No dial messages in regard to the service either. The only dial messages are for the dns-resolver. There's a message about "starting intercepting" for our service.
It looks like in the afternoon my user restarted Ziti (probably trying to fix it.) I found a log chunk of the tunneler starting up and shutting down.
Restart did not/does not fix it. It did work off-network at one point yesterday. Something about the transition from LAN-wifi to hotspot-wifi got it working. But when she rebooted and joined the hotspot-wifi it broke again. I'm going to try testing that today and see if I can pin down what series of events causes it to break/work.
Ziti Dekstop Edge for Mac does two things to make it possible to intercept these addresses:
It creates a DNS client for the hostname(s) that are to be intercepted. The DNS client(s) can be seen with scutil --dns. You should see that the clients nameserver points to an address that is routed to the tunneler's DNS server - 100.64.0.2 by default. From the log it looks like this is happening:
INFO tunnel-cbs:ziti_dns.c:349 new_ipv4_entry() registered DNS entry mail.ourdomain.com -> 100.64.0.3
INFO tunnel-cbs:ziti_dns.c:349 new_ipv4_entry() registered DNS entry app.ourdomain.com -> 100.64.0.4
DEBUG tunnel-sdk:ziti_tunnel.c:321 ziti_tunneler_intercept() intercepting address[tcp:100.64.0.3/32:443] service[webapp-svc]
DEBUG tunnel-sdk:ziti_tunnel.c:321 ziti_tunneler_intercept() intercepting address[udp:100.64.0.3/32:443] service[webapp-svc]
DEBUG tunnel-sdk:ziti_tunnel.c:321 ziti_tunneler_intercept() intercepting address[tcp:100.64.0.4/32:443] service[webapp-svc]
DEBUG tunnel-sdk:ziti_tunnel.c:321 ziti_tunneler_intercept() intercepting address[udp:100.64.0.4/32:443] service[webapp-svc]
INFO tunnel-cbs:ziti_tunnel_ctrl.c:925 on_service() starting intercepting for service[webapp-svc]
And indeed your log shows that Ziti Desktop Edge for Mac is being consulted for your service hostnames:
TRACE tunnel-cbs:ziti_dns.c:819 on_dns_req() received DNS query q_len=39 id[afc7] recursive[true] type[1] name[mail.ourdomain.com]
INFO tunnel-cbs:ziti_dns.c:567 format_resp() found record[100.64.0.3] for query[1:mail.ourdomain.com]
This message strongly suggests that DNS is working as it should, but just the same lets use a system utility like ping to verify this:
% ping app.mydomain.com
PING app.mydomain.com (100.64.0.4): 56 data bytes
64 bytes from 100.64.0.4: icmp_seq=0 ttl=255 time=0.599 ms
^C
The result of the ping is not important here. We're just interested in the IP address that ping gets from the macOS resolver. I'll just mention to avoid using dig for this purpose, because it follows different rules about how it determines which DNS servers to use for lookups.
Ziti Desktop Edge for Mac also creates routes for any IP addresses that it wants to intercept. Note that any hostnames in the list of intercepted addresses should be resolved to something in the 100.64/10 range, which is routed to the tun interface that Ziti Desktop Edge for Mac created:
If we see all of the DNS and routing setup is happening as above but there are none of these "intercepted" messages in the log for your service then my first guess is a conflict in either the DNS clients or the routing tables. It doesn't look like a DNS conflict though, because we can see DNS queries being handled for your service hostnames.
Summary
Let's check that your service hostnames are resolving as expected:
% ping mail.ourdomain.com
We should see a 100.64/10 address.
The address returned/shown by ping should be routed to the Ziti Desktop Edge for Mac's tun interface:
% netstat -an -finet | grep -F 100.64
I'm also interested to know which application are you using with your service? It is a browser? If so is there any chance that it's configured with a proxy server that would prevent the connection from being intercepted by Ziti Desktop Edge for Mac? It would be interesting to see if curl https://app.mydomain.com:443 is intercepted.
It does seem to be a DNS issue. I verified Ziti Desktop was disconnected. Then changed the Mac to an external hotspot and connected Ziti. I let it go green. The Ziti DNS server was visible in scutil --dns. The route was also visible in netstat with no conflicting ranges. The FQDN mail.ourdomain.com has public resolution and is also a Ziti intercept. The public address is what's returning via ping and curl.
The problem Mac also has Avast Premium on it and I wonder if that's complicating or interfering. I see no difference in scutil --dns between the problem Mac and the healthy Mac. This is just a wild guess I have no evidence to support the theory. This also doesn't explain why it worked for weeks and suddenly failed.
That's a common problem. A lot of anti-virus type software terminate unexpected connections trying to be 'helpful'. We have seen this sort of issue in the past.
That's surprising. The macOS resolver should use the DNS client with the best-matching name for a given lookup, which I would think is the one created by ZDE.
It seems that either the ZDE name server isn't answering for the hostname lookups, or the macOS resolver is choosing the answer that the global DNS server produced instead.
Real Site provides an encrypted connection between your web browser and Avast's own DNS server to prevent hijacking. In other words, Real Site ensures that the displayed website is the authentic one.
I saw that too. I'm not a fan of these types of products. IMO questionable value to anyone and potentially harmful in a business environment. They cause problems with sites like ours that use split DNS.
But, I agree, seems to be intercepting the system's DNS lookups before it even gets to ZDE.
Sorry for consuming your time on this non-Ziti issue. But I appreciate the help. My Mac skills improve a little each time I have to troubleshoot one.