Ziti-Router Deployment Error

Hi I'm trying to stand up an edge-router in an isolated, disconnected environment hosted on an esxi host running vms. I have the controller deployed already in a separate VM and created the identity for the edge-router. I am trying to deploy the edge-router in a separate VM on the same host that I already confirmed can ping the controller hosted VM. After moving over the router-associated jwt enrollment token into the separate router-VM and running through the deployment steps, your bootstrap bash script fails for my instance and outputs a malformed token as the cause for the failure in the tmp file. Can anyone help me determine a path forward for this? Not sure how to remediate.

I'll dump the full contents of the tmp file in this post:

INFO: config file exists in /var/lib/private/ziti-router/config.yml
{"cause":"token is malformed: token contains an invalid number of segments","file":"github.com/openziti/ziti/router/enroll/enroll.go:98","func":"github.com/openziti/ziti/router/enroll.(*RestEnroller).Enroll","level":"fatal","msg":"failed to parse JWT","time":"2025-01-27T16:43:40.010Z"}
DEBUG: using config file: /var/lib/private/ziti-router/config.yml
DEBUG: preparing working directory: /var/lib/private/ziti-router
DEBUG: ZITI_ENROLL_TOKEN is defined in /opt/openziti/etc/router/bootstrap.env
DEBUG: using config: /var/lib/private/ziti-router/config.yml

Hi @elibrown333, welcome to the community and to OpenZiti (and zrok/BrowZer)!

A malformed token is strange. It makes me think the token was perhaps corrupt or just empty? We probably need more details. With the controller running, you're able to authenticate using the ziti CLI, right?

Have you inspected the token to make sure it's not corrupt? You're doing this all immediately too right, you didn't say pause in between making the controller and trying to enroll a router?

It's often helpful for us to see the full output of a command. Sometimes we can see abnormalities that aren't obvious if you don't look at these logs all the time...

Also if you can supply the full command you ran, and any relevant details, that's often useful too. One final thing, the controller has an "advertise" address, you're sure this new VM is able to connect to the controller, right? I would expect a totally different error, but if it can't connect, that might be why/how the .jwt ends up empty/malformed...

Those are some thoughts

2 Likes

Hi, Im a little late replying to this thread, my apologies. I was able to resolve this initial error and successfully deployed both router and controller software in their own respective VMs. I can status the service for both and see both are running. My issue now however is still with the router. Although I can status it and get an active (running) service returned, the lines immediately following under the status are telling me it cannot connect to the controller and I am unsure how this is. Do you know if there are any steps I'm missing in order to establish connectivity between the two? @TheLumberjack

Also when I list out this edge-router using the ziti-cli command it says this router is offline, I thought once you enroll the router with the controller, it stands the router online. Is this not the case?

Can you share the output from the router? My guess would be the advertised address from the controller is incorrect and the router has no route to address.

A common problem people have is advertising something like "localhost" from the controller, informing routers "you can contact me at localhost:1280". If the router is on the same machine, you're lucky! That WILL work. However, if it's on a different machine/vm, obviously that won't work.

My guess is the controller's configured with the wrong address and routers can't connect to it. I would think the router output would help clarify this.

Can you check the router's config and make sure it's properly setup. It'll be around line 10 ish and look something like this:

ctrl:
  endpoint:             tls:ec2-3-18-113-172.us-east-2.compute.amazonaws.com:8440

Make sure the router can access whatever is configured there.

ziti_router_status_output.txt (5.7 KB)

I tried to paste the status as best I could^, as for the router config, the section that lists the controller attributes is defined as:

tls:ziti-controller-2.[full_fqdn]:8440. Pretty sure I input this part correctly. As i said in my first post, both VMs can ping each other as well using the fqdn's, so Im not sure why im encountering this connectivity issue

@TheLumberjack Does it at all matter that when I status the controller, in the output of the last 2 lines it contains, "
{"SupportsRouterModel":false,....} ?

I am using the same software versions for each one so I'm not sure why it would spit this out

I wouldn't think so. what "status" are you using?

When I look at the logs, I see: error connecting ctrl (EOF). I expect the controller has logs indicating the connection is not valid. With this information, I now expect the router to have incorrect PKI. This looks/feels like a PKI issue now. It's also possible the controller is not advertising the FQDN you're configuring the router with.

Are there any relevant logs in the controller? From the router, can you issue

openssl s_client -connect ziti-controller-2.5g.mil:8440 2>&1 | openssl x509 -text

pinging is one thing, but making sure the port is open in the firewall and indeed on the expected port is another :slight_smile:

Let's see what the response is from the openssl command...

Certs are returned after running that command, one line in the cert does stick out though:

DNS:localhost, DNS:ziti-controller-2.[fqdn], IP Address: 127.0.0.1, IP Address:0:0:0:0:0:0:0:1

Is something pointing to itself incorrectly in one of the configs?
@TheLumberjack

That's the exact line I wanted to see. It appears the DNS entry is contained within that response. That's what I wanted to confirm. Often the PKI will be incorrect, people will change the advertise address after initializing a controller's PKI, leading to a mismatch between the PKI's DNS entries and the address people use to connect to the controller. I had suspected that was the case here, but it doesn't appear to be the problem.

It looks like you're using a linux pacakge or docker-based deployment? Is that correct? Can you remove whatever storage is backing the router and recreate it or make a different router? Somehow I think the PKI is out of sync, I'm unsure of how that happened/what steps you have run but that's my current guess.

If you can provide clear steps to reproduce, we could try it on our side, but without that all I can do is kind of guess at what the problem may be.

1 Like

Yes, I am running linux packages on VM deployments, Recreate the router-vm? I will likely make a separate one so I can compare configs between the two

Yeah. All I can imagine right now is that the PKI is somehow out of date. When an router enrolls it writes certificates into the location specified by the config. You should be able to just remove that storage (wherever it is) and re-enroll a new router.

Without more logs and complete steps, I'm just guessing :frowning: Sorry

Hi @TheLumberjack , I have an update since my last reply to this thread. I have since been able to rebuild the router vm and am successfully able to establish connectivity with the controller and edge-router. My issue now however is that even though I get a successful status when querying the router, The service-policies and configs I have created to configure access to the app-server service I made are unable to use this edge-router and I am unsure why. Here are the logs I'm getting on the controller side.
'''
Feb 20 11:40:07 ziti-controller-2 ziti[16111]: {"error":"NO_EDGE_ROUTERS_AVAILABLE: No edge routers are assigned and online to handle the requested connection","file":"github.com/openziti/ziti/controller/handler_edge_ctrl/common.go:93","func":"github.com/openziti/ziti/controller/handler_edge_ctrl.(*baseRequestHandler).logResult","level":"error","msg":"operation failed","operation":"tunnel.create.terminator","routerId":"vhaBG55bAJ","time":"2025-02-20T11:40:07.496Z"}

'''
How is it that my edge-router can establish connection with my controller and list that it is online but the policies can't use it?


Hi @elibrown333, NO_EDGE_ROUTERS_AVAILABLE is a very specific error. It almost universally means your policies are incorrect, generally it's the edge-router-policies.

If you run

ziti edge policy-advisor identities -q

you will likely be told an identity has no common routers.

To address this, you need to create an edge-router-policy that allows identities to access the edge router. I see from your cap, you used "app" as attributes. That's fine. I personally like to use #public to indicate this is a router that is deployed in "public" address space, but the actual attribute doesn't matter.

Make an edge-router-policy that grants #all identities access to the app routers and you should be all set