Windows Edge Client - DNS not working

I've installed the Windows Edge Client on 3 machines. 2 of which work as expected. The 3rd, does not resolve DNS. I'm not entirely sure how to troublshoot this, or what information to provide.

What I did do, and what I did try is the following:

>nslookup test.domain 100.64.0.2
Server:  UnKnown
Address:  100.64.0.2

Non-authoritative answer:
Name:    test.domain
Address:  100.64.0.3

In the edge tunneler logs, I see this:

[2024-02-16T13:53:14.031Z]   DEBUG ziti-edge-tunnel:windows-scripts.c:172 chunked_add_nrpt_rules() powershell -Command "$Namespaces = @(
@{n='test.domain';})

ForEach ($Namespace in $Namespaces) {
$ns=$Namespace['n']
$Rule = @{Namespace=${ns}; NameServers=@('100.64.0.2'); Comment='Added by ziti-edge-tunnel'; DisplayName='ziti-edge-tunnel:'+${ns}; }
Add-DnsClientNrptRule @Rule
}
"
[2024-02-16T13:53:14.031Z]   ERROR ziti-edge-tunnel:windows-scripts.c:67 exec_process() Could not execute the command due to ENOENT
[2024-02-16T13:53:14.031Z]    WARN ziti-edge-tunnel:windows-scripts.c:176 chunked_add_nrpt_rules() Add domains NRPT script: 0(err=0)

I executed the commands on the command line and they executed without issue, but did not resolve anything. Of course, this isn't the same security context as the service.

Afterwards, the service goes into a "loop", where this is the only thing logged (even on a connection attempt)

[2024-02-16T14:25:35.725Z] VERBOSE ziti-sdk:posture.c:198 ziti_send_posture_data() ztx[0] starting to send posture data
[2024-02-16T14:25:35.725Z]   DEBUG ziti-sdk:posture.c:211 ziti_send_posture_data() ztx[0] posture checks must_send set to TRUE, new_session_id[FALSE], must_send_every_time[TRUE], new_controller_instance[FALSE]
[2024-02-16T14:25:35.725Z] VERBOSE ziti-sdk:posture.c:236 ziti_send_posture_data() ztx[0] checking posture queries on 6 service(s)
[2024-02-16T14:25:35.725Z] VERBOSE ziti-sdk:posture.c:535 ziti_pr_send_bulk() ztx[0] no change in posture data, not sending
[2024-02-16T14:25:44.405Z]   DEBUG ziti-sdk:ziti_ctrl.c:697 ctrl_paging_req() ctrl[host.mydomain.com] starting paging request GET[/current-identity/edge-routers]
[2024-02-16T14:25:44.405Z] VERBOSE ziti-sdk:ziti_ctrl.c:702 ctrl_paging_req() ctrl[host.mydomain.com] requesting /current-identity/edge-routers?limit=25&offset=0
[2024-02-16T14:25:44.405Z] VERBOSE ziti-sdk:ziti_ctrl.c:141 start_request() ctrl[host.mydomain.com] starting GET[/current-identity/edge-routers?limit=25&offset=0]
[2024-02-16T14:25:44.405Z] VERBOSE ziti-sdk:ziti_ctrl.c:141 start_request() ctrl[host.mydomain.com] starting GET[/current-api-session/service-updates]
[2024-02-16T14:25:44.411Z] VERBOSE ziti-sdk:ziti_ctrl.c:176 ctrl_resp_cb() ctrl[host.mydomain.com] received headers GET[/current-identity/edge-routers?limit=25&offset=0]
[2024-02-16T14:25:44.411Z]   DEBUG ziti-sdk:ziti_ctrl.c:329 ctrl_body_cb() ctrl[host.mydomain.com] completed GET[/current-identity/edge-routers?limit=25&offset=0] in 0.005 s
[2024-02-16T14:25:44.411Z]   DEBUG ziti-sdk:ziti_ctrl.c:345 ctrl_body_cb() ctrl[host.mydomain.com] received 1/1 for paging request GET[/current-identity/edge-routers]
[2024-02-16T14:25:44.411Z]   DEBUG ziti-sdk:ziti_ctrl.c:357 ctrl_body_cb() ctrl[host.mydomain.com] completed paging request GET[/current-identity/edge-routers] in 0.005 s
[2024-02-16T14:25:44.412Z] VERBOSE ziti-sdk:ziti_ctrl.c:176 ctrl_resp_cb() ctrl[host.mydomain.com] received headers GET[/current-api-session/service-updates]
[2024-02-16T14:25:44.412Z]   DEBUG ziti-sdk:ziti_ctrl.c:329 ctrl_body_cb() ctrl[host.mydomain.com] completed GET[/current-api-session/service-updates] in 0.007 s
[2024-02-16T14:25:44.412Z] VERBOSE ziti-sdk:ziti.c:1252 check_service_update() ztx[0] not updating: last_update is same previous (2024-02-16T00:00:19.233Z == 2024-02-16T00:00:19.233Z)
[2024-02-16T14:25:44.412Z] VERBOSE ziti-sdk:ziti.c:1293 ziti_services_refresh() ztx[0] scheduling service refresh 10 seconds from now

What else can I dig into to solve this?

First thing is don't use nslookup. nslookup specifically bypasses the Windows NRPT and can't be relied on. You can use powershell/pwsh's Resolve-DnsName

for example on my local machine:

Resolve-DnsName mattermost.tools.netfoundry.io

Name                                           Type   TTL   Section    IPAddress
----                                           ----   ---   -------    ---------
mattermost.tools.netfoundry.io                 A      60    Answer     100.64.0.18

The error you show: Could not execute the command due to ENOENT that seems troubling. It makes me think your NRPT isn't getting updated properly... We've seen these sorts of problems uncommonly and have historically been due to the powershell environment on that machine simply being "broken" somehow. Windows repair, upgrades, etc, have solved the problem in the past but it's usually an uncommon issue and not repeatible. Every time it's different.

You can generate a "feedback" zip file. It collects numerous data including logs, system info etc and inspect the bundle it generates. Most notably in this case will be the NRPT. Get-DnsClientNrptRule is what that bundle will execute along with Add-DnsClientNrptRule. we use powershell to add the rule because there's no WIN32 api for it still... :confused: you could try adding/removing an entry to see if it succeed. I've seen the NRPT be 'corrupted' before -- again for no known reason. "repairing windows" often (not always, sometimes it's powershell, etc) fixes it...

Other questions/comments/thoughts...

you do see the the service listed at the client? Like my 'mattermost' identity here, you have > 0 services?

image

If you click on that row in the ZDEW, you see the service you expect to see on TCP/UDP and the port all correctly?

image

If you scan that log, do you see any "starting dial" messages?

[2024-02-16T00:01:14.195Z] DEBUG ziti-sdk:connect.c:497 process_connect() conn[0.548/Connecting] starting Dial connection for service[mattermost.tools.netfoundry.io] with session....

Thank you for the elaborate explanation.

Working from the bottom up because I don't want to ignore those. Yes, I definitely have services registered.

Your powershell tips helped me diagnose further. Once I was connected I could confirm with Get-DnsClientNrptRule that there weren't any addresses.

I launched an (elevated) powershell and added one:
Add-DnsClientNrptRule -Namespace "test.domaIn" -NameServers "100.64.0.2"

And it worked. So, something on the service level is causing an issue.

STRANGE... This leads me to think it's something to do with the SYSTEM powershell environment... The ZDEW runs the ziti-edge-tunnel as a service on windows. To diagnose a bit more, let's do this...

Turn off the ZDEW using the big green button:
image

Confirm it's off:
image

From that elevated prompt run:

"c:\Program Files (x86)\NetFoundry Inc\Ziti Desktop Edge\ziti-edge-tunnel.exe" run -I C:\Windows\System32\config\systemprofile\AppData\Roaming\NetFoundry

This will emulate how the service executes. After you do that, you'll notice the ZDEW is back "on" and you'll have services:

Test things out. If they work -- well you'll know it's somehow related to your SYSTEM profile, etc. Diagnosing/fixing that -- is gonna be a pain :frowning: It's always "something strange" in my experience and "something windowsy" (i expect you know what i mean).... :slight_smile:

Take note at the first few lines when the process starts:

[2024-02-16T15:29:10.097Z]    INFO ziti-edge-tunnel:ziti-edge-tunnel.c:2003 run() ============================ service begins ================================
[2024-02-16T15:29:10.098Z]    INFO ziti-edge-tunnel:ziti-edge-tunnel.c:2004 run() Logger initialization
[2024-02-16T15:29:10.098Z]    INFO ziti-edge-tunnel:ziti-edge-tunnel.c:2005 run()       - initialized at   : Fri Feb 16 2024, 10:29:10 AM (local time), 2024-02-16T15:29:10 (UTC)
[2024-02-16T15:29:10.098Z]    INFO ziti-edge-tunnel:ziti-edge-tunnel.c:2006 run()       - log file location: c:\Program Files (x86)\NetFoundry Inc\Ziti Desktop Edge\/logs/service/ziti-tunneler.log.202402160000.log
[2024-02-16T15:29:10.098Z]    INFO ziti-edge-tunnel:ziti-edge-tunnel.c:2007 run() ============================================================================
[2024-02-16T15:29:10.099Z]    INFO ziti-sdk:utils.c:200 ziti_log_set_level() set log level: root=3/INFO
[2024-02-16T15:29:10.099Z]    INFO ziti-edge-tunnel:tun.c:171 tun_open() Wintun v0.0 loaded
[2024-02-16T15:29:10.099Z]    INFO ziti-edge-tunnel:tun.c:522 cleanup_adapters() Cleaning up orphan wintun adapters
[2024-02-16T15:29:10.101Z]    INFO ziti-edge-tunnel:tun.c:153 flush_dns() DnsFlushResolverCache succeeded
[2024-02-16T15:29:10.589Z]    INFO ziti-edge-tunnel:tun.c:405 if_change_cb() default route is now via if_idx[9]
[2024-02-16T15:29:10.590Z]    INFO ziti-edge-tunnel:tun.c:411 if_change_cb() updating excluded routes
[2024-02-16T15:29:11.645Z]    INFO ziti-edge-tunnel:windows-scripts.c:469 is_nrpt_policies_effective() NRPT policies are effective in this system
[2024-02-16T15:29:12.068Z]    INFO ziti-edge-tunnel:ziti-edge-tunnel.c:1562 run_tunnel() Setting interface metric to 255
[2024-02-16T15:29:12.075Z]    INFO tunnel-sdk:ziti_tunnel.c:60 create_tunneler_ctx() Ziti Tunneler SDK (v0.22.19)
[2024-02-16T15:29:12.078Z]    INFO tunnel-cbs:ziti_dns.c:171 seed_dns() DNS configured with range 100.64.0.0 - 100.127.255.255 (4194302 ips)

You'll see we probe the NRPT and produce this line:

[2024-02-16T15:29:11.645Z]    INFO ziti-edge-tunnel:windows-scripts.c:469 is_nrpt_policies_effective() NRPT policies are effective in this system

You should be seeing that...

Let's see how far along this gets you. Looking forward to your next response...

FYI the command, again, is:

"c:\Program Files (x86)\NetFoundry Inc\Ziti Desktop Edge\ziti-edge-tunnel.exe" run -I C:\Windows\System32\config\systemprofile\AppData\Roaming\NetFoundry

I noticed my copy/paste didnt' work (i edited the original reply) but felt compelled to post here in case you see the -i somehow...

Something windowsy alright.
Powershell wasn't in the path. We won't even begin to guess how something like that happens.
The error on the console made it much clearer:
'powershell' is not recognized as an internal or external command, operable program or batch file.
So, the service error "ENOENT" was literally implying that powershell.exe couldn't be found.

Thanks!

1 Like

Thanks. I've filed an enhancement to verify powershell is executable before trying to run the NRPT commands and make a better log error for the future people who get something like this! :slight_smile:

cheers

1 Like