Hosting AD-related services via ziti-edge-tunnel on any domain-joined Windows host (DC or member server) breaks Cloud Kerberos Trust ticket issuance for Entra ID–joined dial clients; same configuration on an Edge Router works

Summary

When an OpenZiti service intercepting Active Directory traffic (DNS on 53, Kerberos on 88, LDAP on 389/636, SMB on 445) is bound by ziti-edge-tunnel running on any domain-joined Windows host — whether that host is a Domain Controller or a regular domain member server — Entra ID–joined Windows clients on the dial side fail to obtain an on-prem Kerberos TGT via the Cloud Kerberos Trust flow.

Moving the bind to an Edge Router with tunneling enabled — without changing any service configs, intercept configs, host configs, identities, posture checks, or service policies — resolves the issue immediately. Dial clients then successfully receive a krbtgt/<DOMAIN> TGT and all downstream Kerberos-dependent operations work.

The common factor in failing configurations is domain-joined Windows host running ziti-edge-tunnel as the bind. The common factor in working configurations is Linux Edge Router as the bind.

Environment

  • OpenZiti controller: 2.0

  • Edge Routers: 2.0, hosted on Azure (Ubuntu), tunneling mode enabled — works as bind

  • ziti-edge-tunnel on Windows hosts: latest version (current beta channel) — fails as bind

  • Ziti Desktop Edge for Windows (dial clients): latest beta version

  • Domain Controllers: Windows Server 2019, 2 DCs, healthy replication, AD-integrated DNS

  • Domain member servers tested as bind hosts: Windows Server 2019/2022, fully patched, no special role

  • Identity setup: Hybrid — on-prem AD synced to Entra ID via Entra Connect; Cloud Kerberos Trust configured (AzureADKerberos computer object present); Windows Hello for Business enabled

  • Dial client posture: Entra ID–joined, Intune-enrolled, signing in with WHFB

Services involved (representative)

  • CLIENTS_AD_Services_DNSintercept.v1 wildcard *.<domain>:53 TCP/UDP, host.v1 to 127.0.0.1:53 (on DC bind) or DC IP (on member-server / router bind)

  • CLIENTS_AD_Services — Kerberos / LDAP / SMB ports to DC IPs

  • Standard Bind/Dial service policies, no MFA on bind identities, no posture-check blockers

Steps to reproduce

  1. Deploy two DCs (Server 2019) with AD-integrated DNS and Cloud Kerberos Trust enabled.

  2. Configure the AD-service intercepts above.

  3. Test A — bind on DC: install ziti-edge-tunnel on both DCs, enroll identities, assign Bind policy. From an Entra ID–joined client with Dial policy, sign in with WHFB and trigger an operation requiring an on-prem TGT (e.g. klist get krbtgt/<DOMAIN>, SMB access to a member server, RDP to an AD-joined host). → fails

  4. Test B — bind on a domain-joined member server: stop bind on the DCs, install ziti-edge-tunnel on a Server 2019/2022 member server in the same domain, enroll identity, assign Bind policy with host.v1 pointing at the DC IP. Repeat the same client test. → fails

  5. Test C — bind on Edge Router (Linux, not domain-joined): stop bind on the member server, enable tunneling on an Edge Router with the same host.v1 config. Repeat the same client test. → succeeds

Steps 3 → 4 → 5 are performed in sequence on the same Ziti overlay with no other changes between them.

Expected behavior

The bind host's OS, domain membership, and Ziti component type (router vs ziti-edge-tunnel) should be transparent to the Kerberos / DNS protocol payloads traversing the fabric. The dial client should obtain a partial TGT from Entra at logon, exchange it via the routed Kerberos service for a full on-prem TGT, and proceed normally — regardless of where the AD-service terminator is hosted.

Actual behavior

  • Bind on DC (ziti-edge-tunnel): dial client receives 0x51f / STATUS_NO_LOGON_SERVERS. No on-prem TGT is issued. Kerberos operational log on the client shows a smart-card / certificate domain identification error consistent with the partial-TGT exchange failing. DNS resolution in isolated tests appears to succeed but the full Cloud Kerberos Trust flow does not complete.

  • Bind on domain-joined member server (ziti-edge-tunnel): same symptom. Member server has no AD role and no port conflicts on 88/389/445, yet the failure mode is identical. This rules out the obvious "ports already in use by Windows AD services on the DC" theory.

  • Bind on Edge Router (Linux): dial client successfully obtains krbtgt/<DOMAIN> TGT and all Kerberos-dependent operations succeed. Service / intercept / host / policy definitions are identical to the failing cases.

Hi @msbusk, wow that's a very specific problem. I don't think we'll be able to actually be of much help here to be honest. Your setup is just so bespoke, I don't think we could emulate it... I have a couple of thoughts on things to try...

  • You used a linux edge router for the bind, try the linux ziti-edge-tunnel. I would bet that it works fine? The hypothesis here is that it's somehow related to the offload point being in the same AD. This tests if it's "ziti-edge-tunnel" (which the windows hosts use as well). If this works, we know it's not the C SDK nor the tunneler that is likely the culprit
  • What if you use a different windows host -- NOT joined to the domain (in some other domain)? I assume the linux host is not in the same AD domain? This would confirm that it's not "ziti edge tunnel on windows" but somehow related to being 'on the domain'

If those both succeed, the only thing I could imagine is some policy on the domain controller is the problem. Endpoint protection software maybe, maybe group policy stuff?

Again, I'm not sure we'll be able to get to the bottom of this one really without being able to reproduce. Is there some way we can reduce the amount of infra needed to test? Deploying two DCs and AD integrated DNS and cloude kerberos trust is something I have no expertise doing...

Hi Clint,

Good news — I found a working solution. Your two suggestions both worked by the way, which helped confirm it wasn't the C SDK or the tunneler itself. But I managed to find a setup that works on the domain-joined Windows clients too.

Running the latest beta version of the Windows tunneler combined with the service setup below, everything now works (Kerberos, Cloud Kerberos Trust, DFS, the lot).

The setup consists of three service types:

1. One service per domain controller (bind on that specific DC)

For each domain controller, I create a dedicated service. The intercept address is the FQDN of that specific DC, and the bind side is the tunneler/router running on that same DC, offloading to 127.0.0.1 with protocol and port forwarding enabled. So with two DCs you have two of these services — one per DC.

Intercept config:

{
  "addresses": ["dc01.corp.example.com"],
  "protocols": ["tcp", "udp"],
  "portRanges": [
    {"low": 88, "high": 88},
    {"low": 123, "high": 123},
    {"low": 135, "high": 135},
    {"low": 389, "high": 389},
    {"low": 445, "high": 445},
    {"low": 464, "high": 464},
    {"low": 636, "high": 636},
    {"low": 3268, "high": 3268},
    {"low": 3269, "high": 3269},
    {"low": 9389, "high": 9389},
    {"low": 49152, "high": 65535}
  ]
}

Host config (hosted by the identity on that DC):

{
  "address": "127.0.0.1",
  "forwardProtocol": true,
  "forwardPort": true,
  "allowedProtocols": ["tcp", "udp"],
  "allowedPortRanges": [
    {"low": 88, "high": 88},
    {"low": 123, "high": 123},
    {"low": 135, "high": 135},
    {"low": 389, "high": 389},
    {"low": 445, "high": 445},
    {"low": 464, "high": 464},
    {"low": 636, "high": 636},
    {"low": 3268, "high": 3268},
    {"low": 3269, "high": 3269},
    {"low": 9389, "high": 9389},
    {"low": 49152, "high": 65535}
  ]
}

The ports cover the full AD stack: Kerberos (88/464), LDAP/LDAPS (389/636), Global Catalog (3268/3269), SMB (445), RPC endpoint mapper (135) plus the dynamic RPC range (49152–65535), NTP (123) and AD Web Services (9389).

2. One service for the AD domain name itself, SMB only (bind on ALL domain controllers)

Clients also talk to the bare domain name (e.g. \\corp.example.com\SYSVOL for GPO processing and DFS). This is a single service intercepting the domain FQDN on port 445, hosted by all domain controllers, so any DC can answer:

{
  "addresses": ["corp.example.com"],
  "protocols": ["tcp", "udp"],
  "portRanges": [{"low": 445, "high": 445}]
}

{
  "address": "corp.example.com",
  "port": 445,
  "forwardProtocol": true,
  "allowedProtocols": ["tcp"]
}

3. One DNS service with a wildcard intercept (bind on ALL domain controllers)

DNS queries for anything in the AD namespace are intercepted with a wildcard and forwarded to the AD-integrated DNS on the DCs. This one is also hosted by all domain controllers.

Important: the wildcard must be present in BOTH lowercase and UPPERCASE form. Windows (especially Kerberos/SRV lookups) will query the domain in uppercase, and without the uppercase entry those queries are not intercepted — this was one of the key pieces:

{
  "addresses": ["*.corp.example.com", "*.CORP.EXAMPLE.COM"],
  "protocols": ["tcp"],
  "portRanges": [{"low": 53, "high": 53}]
}

{
  "protocol": "tcp",
  "port": 53,
  "forwardAddress": true,
  "allowedAddresses": ["*.corp.example.com", "*.CORP.EXAMPLE.COM"]
}

So to summarize: per-DC services bound on each individual DC, plus a domain-name SMB service and a wildcard DNS service both bound on all DCs — all running on the latest Windows beta tunneler. With this in place, Entra-joined clients with Cloud Kerberos Trust authenticate and access everything correctly over Ziti.

Thanks for the suggestions, they definitely helped narrow it down.

That is GREAT to hear! That sounds like a very robust setup you have as well, I'm glad everything ended up working!

If you're so inclined, this sort of thing sounds like it would make a pretty cool blog post if you wanted to write it all up... That sort of publicity helps the OpenZiti project on the whole. (and if not, well, I understand that as well :slight_smile: )

Glad you got it working!

Hi Clint,

Thanks! And yes, I'd be happy to write this up as a blog post.

What I really like about this design is the flexibility and isolation it gives:

  • You only bind to the domain controllers you actually want to expose — each service is an explicit, policy-controlled path, so it's fully flexible.
  • The routers can be deployed anywhere there is just SDK access to them for hosting the service they need no access into the infrastructure at all they have only to connect to the router. That keeps OpenZiti completely isolated from the environment it serves.
  • And if you need to reach another environment, you just drop a Ziti client on a server in that domain or directly on the domain controllers if you want host-to-host, end-to-end encryption.