OpenZiti v1.6.12: router fingerprint mismatch after successful enroll (Docker Compose on WSL2, bind-mounted router identity files)

Environment

  • Host: Windows + WSL2 (Ubuntu). Kernel in logs: 6.6.87.2-microsoft-standard-WSL2

  • Docker/Compose running inside WSL.

  • OpenZiti images:

    • openziti/ziti-controller:1.6.12

    • openziti/ziti-router:1.6.12

  • Topology: a lab Docker Compose stack with multiple services, but the issue is specifically between ziti-controller and ziti-router.

  • Controller persistence:

    • named volume ziti_controller_data:/openziti/var (controller DB)

    • named volume ziti_pki:/openziti/pki (PKI)

    • bind mount ./configs/ziti/controller:/openziti/config

  • Router config/identity persistence:

    • bind mount ./configs/ziti/router:/openziti/config

    • router runs as user: "0:0" and uses custom entrypoint.s

Project/orchestration code

  • The lab is orchestrated by bash scripts + Docker Compose. Router/controller YAML files are generated by a bash provision script.

  • scripts/provision_ziti.sh generates:

    • configs/ziti/controller/controller.yaml (controller config)

    • configs/ziti/router/router.yaml (router config)

  • Router config generated by the script pins identity files to the bind mount:

    identity:
      cert: /openziti/config/router.cert
      server_cert: /openziti/config/router.cert
      key: /openziti/config/router.key
      ca: /openziti/config/ctrl-ca.cert
    
    ctrl:
      endpoint: tls:ziti-controller:6262
      ca: /openziti/config/ctrl-ca.cert
    

    (from provision_ziti.sh)

  • Router container entrypoint only validates presence of router.cert/router.key/ctrl-ca.cert then runs router; it does not auto-enroll on start.

What I’m trying to do (expected behavior)

  1. Start controller (healthy).

  2. ziti controller edge init (if fresh) and login.

  3. Recreate edge-router, export JWT.

  4. Run one-shot enroll via ziti-router enroll ... -j router.jwt which ends with registration complete.

  5. Start router container and it should go ONLINE in controller.

What actually happens (problem)

  • Enrollment frequently reports success:

    • ziti/router/enroll.(*RestEnroller).Enroll: registration complete
  • But router never becomes online, and provisioning script fails:

    • [lab][fatal] Edge-router did not become online
  • Controller logs show repeated fingerprint mismatch / unenrolled router errors on the control plane port 6262:

    • router fingerprint mismatch with routerId and mismatched fingerprints

    • incorrect fingerprint/unenrolled router, routerId: nRwU0xrBIW, given fingerprints: [...]

  • Example (same routerId, stable mismatch):

    • Controller side:
      router fingerprint mismatch, routerId:"nRwU0xrBIW"; field fp:"75ed65...", givenFps:["d82920..."]

    • Router side: starts normally but fails to connect to controller endpoint:

      • routerId":"nRwU0xrBIW" then unable to connect controller ... (EOF)

So it looks like the controller expects one fingerprint for routerId, but the router presents a different identity certificate (or controller DB has different one saved).

Repro steps (as implemented in the scripts)

From PowerShell/WSL I run the lab script which does roughly:

  1. docker compose up -d ... and waits for controller healthcheck.

  2. provision_ziti.sh does:

    • login to Edge Management API

    • delete/recreate the edge-router and export a fresh JWT

    • run one-shot enroll using the router image

    • then start the router container and wait for it to become ONLINE

The relevant parts from logs:

  • “Recreating edge-router ziti-router and exporting JWT” then JWT is copied to ./configs/ziti/router/router.jwt

  • “One-shot router enrollment” runs and finishes with “registration complete”

  • Router container starts, but provisioning ends with “Edge-router did not become online”

Cleanup attempts / why this is not “old images”

  • Images are pinned to 1.6.12.

  • Between iterations I delete the iteration folder state on disk and restart containers.

  • However, compose repeatedly prints warnings that the named volumes already exist and were not created by this compose project:

    • volume "..._ziti_pki" already exists but was not created by Docker Compose

    • volume "..._ziti_controller_data" already exists but was not created by Docker Compose
      (I’m mentioning this because it may be relevant to persistence/DB/PKI state across runs.)

Additional observations that may be relevant

  1. Router sometimes loads a cached router model file:

    • loaded router model from file ... /openziti/config/router.yaml.proto.gzip

      The cleanup in my provision script removes router.yaml.json.gzip (note: different name)

      — unsure if this mismatch means the old router.yaml.proto.gzip can persist unintentionally.

  2. Router uses endpoints.yml but reports it empty and falls back to initial endpoint from config:

    • empty endpoint list in endpoints file, falling back to initial endpoints from config

      Endpoint used: tls:ziti-controller:6262

  3. Controller also logged a TLS “bad certificate” handshake on 1280 from localhost earlier, but main issue is the router control plane mismatch on 6262.

What I need help with (questions for OpenZiti devs)

  1. In this setup (Docker Compose + WSL2 + bind-mounted router identity files), what are the most likely causes for:

    • enroll reports “registration complete”

    • but controller then rejects control plane connection with fingerprint mismatch for the same routerId?

  2. Is it possible that:

    • router is using a different cert/key than the ones produced by the enroll step (e.g., stale files, wrong path, cached model)?

    • controller has stale router record in DB due to persistent volumes / timeline / router model caching?

  3. What is the recommended fully-deterministic storage strategy for router identity (router.cert/router.key) in Docker/WSL context?

    • Should identity be stored in a named volume rather than a bind mount?
  4. Any known pitfalls with the generated/implicit trust domain warnings on controller startup affecting enrolled components?

    docker-console-logs2.txt (627.4 KB)

    help-please.zip (115.7 KB)

    ziti-router-logs2.txt (208.5 KB)

    ziti-controller-logs2.txt (268.9 KB)

    power-shell-logs2.txt (5.1 KB)

Hi @goler, welcome to the community and to OpenZiti!

Thanks for all the details. Overall, my guess is that there's some state that not actually getting cleaned up. Could another router be running at the same time? Is that a possibility at all?

compose repeatedly prints warnings that the named volumes already exist and were not created by this compose project:

Sounds to me like perhaps the volumes aren't getting removed using down -v? Stale certs is definitely what I think is happening.

It's intentional, but I think the cleanup is just not cleaning up "all the things".

Almost certainly. This is what I think is happening.

Well it's up to you really. Me, personally when I use docker, I would use either a named volume OR a bind mount but if I'm using compose I'll usually use a named volume just so i can let compose clean it all up when I'm done. I don't usually run things in docker for long periods of time though so if I were going to run it for long periods of time I could see myself doing it either way. It really should not matter how the files are persisted in either of these cases.

I'm not sure exactly what you're referring to here. If you mean this: "this environment is using a default generated...." I wouldn't expect that to matter for what you're doing.

If you haven't seen them, I would have a look at ziti/dist/docker-images/ziti-controller/compose.yml at main · openziti/ziti · GitHub and ziti/dist/docker-images/ziti-router/compose.yml at main · openziti/ziti · GitHub. I looked in your zip and your compose looks like maybe it's removed some stuff that you might want to include? Not sure, just a thought.

This help at all? I tried to run your project but in the time I had, I didn't get it cooking... Hopefully this helps?

Hi @TheLumberjack,

thank you very much for taking the time to look into my problem so carefully. I really appreciate that you went into the details and even tried to run the project on your side I have a lot of respect for that, and I’m genuinely grateful for your help.
The issue is now resolved, and your comments helped point me in the right direction.
The root cause was not an OpenZiti bug and not outdated images. The real problem was that my project was using a custom controller/router lifecycle instead of the official OpenZiti Docker bootstrap flow.

What was wrong in my setup:

  • manual generation of controller.yaml and router.yaml

  • manual PKI generation

  • manual edge init

  • manual one-shot router enrollment

  • custom router entrypoint.sh

  • router identity/state stored in a bind mount

  • manual cleanup of local files between iterations

Because of that, controller and router could end up with different state:

  • controller had one expected router fingerprint

  • router started with a different cert/key pair

  • result: router fingerprint mismatch, incorrect fingerprint/unenrolled router, router never became online

What fixed it:

  1. I removed custom config generation
    I stopped generating controller.yaml and router.yaml myself.

  2. I removed manual PKI and manual edge init
    The controller now bootstraps itself using the official image flow.

  3. I removed bind-mounted router identity/state
    Previously router state lived in the project folder, including certs, keys, cache, and router state files.
    I moved router state to a named Docker volume instead.

  4. I removed the custom router entrypoint
    Previously the router was handled as a separate enroll step and then a separate run step.
    Now it uses the official bootstrap/enrollment flow from the image itself.

  5. I switched both controller and router to the official volume-based storage model
    Both now use named volumes managed by Docker Compose.

The new working model is now very simple:

  1. start controller

  2. wait for health

  3. login to edge management API

  4. create edge-router and export JWT

  5. pass JWT into ZITI_ENROLL_TOKEN

  6. start router

  7. wait until router is online

  8. then create services/policies/identities for the lab

Result:

  • no more router fingerprint mismatch

  • no more incorrect fingerprint/unenrolled router

  • router successfully enrolled

  • router connected to controller

  • router reached ONLINE=true

So in the end, the real fix was to stop overriding the official OpenZiti lifecycle and let the official controller/router Docker images manage their own state.
Your suggestion that stale state / stale certs were the likely cause was absolutely on point.
Thanks again your reply was very helpful.

1 Like

As I perused the files you shared, it seems very interesting! I hope you'll share back with the community whatever results/findings you might have! We always appreciate hearing them. Cheers