Hiya, I've recently re-built my HA Ziti system using v1.6.0 for Controllers and Routers. I have been on v1.5.0 for a while.
My system consists of two HA Controllers and one public Edge Router.
I'm having an issue where after restarting an Edge Router, it's unable to connect to all HA controllers.
Here are the steps i take....
Once my HA Controller cluster is established i run the following to create an ER.
ziti edge create edge-router "edge-router-${RANDOM_STRING}" --jwt-output-file "${ZITI_HOME}/edge-router-${RANDOM_STRING}.jwt" --tunneler-enabled
Next i create the router configuration file.
v: 3
identity:
cert: "/var/lib/private/ziti-router/router.cert"
server_cert: "/var/lib/private/ziti-router/router.server.chain.cert"
key: "/var/lib/private/ziti-router/router.key"
ca: "/var/lib/private/ziti-router/router.cas"
ha:
enabled: true
ctrl:
endpoint: tls:ziti-controller-1.az.lifeboat.ziti:8443
link:
dialers:
- binding: transport
listeners:
- binding: transport
bind: tls:0.0.0.0:9443
advertise: tls:ziti-router-2.az.lifeboat.ziti:9443
options:
outQueueSize: 4
listeners:
- binding: edge
address: tls:0.0.0.0:443
options:
advertise: ziti-router-2.az.lifeboat.ziti:443
connectTimeoutMs: 5000
getSessionTimeout: 60
- binding: tunnel
options:
mode: host #tproxy|host
edge:
csr:
country: US
province: NC
locality: Charlotte
organization: NetFoundry
organizationalUnit: Ziti
sans:
dns:
- localhost
- ziti-router-2.az.lifeboat.ziti
- ziti-router-2
ip:
- "127.0.0.1"
- "::1"
forwarder:
latencyProbeInterval: 0
xgressDialQueueLength: 1000
xgressDialWorkerCount: 128
linkDialQueueLength: 1000
linkDialWorkerCount: 32
rateLimitedQueueLength: 100
rateLimitedWorkerCount: 25
Next i enrol the Edge Router
ziti router enroll ${ZITI_HOME}/config.yml --jwt "${ZITI_HOME}/edge-router-${RANDOM_STRING}.jwt"
Finally i start the router service systemctl start ziti-router.service
I see the following separate log lines Which suggest the ER is able to connect to both my HA Controllers.
Apr 11 11:45:48 ziti-router-2 ziti[1347]: {"endpoint":"tls:ziti-controller-1.az.lifeboat.ziti:8443","file":"github.com/openziti/ziti/router/env/ctrls.go:203","func":"github.com/openziti/ziti/router/env.(*networkControllers).connectToControllerWithBackoff.func3","level":"info","msg":"successfully connected to controller","time":"2025-04-11T11:45:48.722Z"}
Apr 11 11:45:58 ziti-router-2 ziti[1347]: {"endpoint":"tls:ziti-controller-2.az.lifeboat.ziti:8443","file":"github.com/openziti/ziti/router/env/ctrls.go:203","func":"github.com/openziti/ziti/router/env.(*networkControllers).connectToControllerWithBackoff.func3","level":"info","msg":"successfully connected to controller","time":"2025-04-11T11:45:58.735Z"}
Also, when i run ziti edge list edge-routers
on either of my Controllers, The ER ONLINE
state is true
.
╭────────────┬───────────────────┬────────┬───────────────┬──────┬────────────╮
│ ID │ NAME │ ONLINE │ ALLOW TRANSIT │ COST │ ATTRIBUTES │
├────────────┼───────────────────┼────────┼───────────────┼──────┼────────────┤
│ 8F3ku2Kbfa │ edge-router-ljxvi │ true │ true │ 0 │ │
╰────────────┴───────────────────┴────────┴───────────────┴──────┴────────────╯
My problem starts when i restart the ER with systemctl restart ziti-router.service
. After this, it seems my ER is only able to connect to one of my HA Controllers ziti-controller-2
.
In the ER log i see.
Apr 11 12:01:50 ziti-router-2 ziti[1631]: {"endpoint":"tls:ziti-controller-2.az.lifeboat.ziti:8443","file":"github.com/openziti/ziti/router/env/ctrls.go:203","func":"github.com/openziti/ziti/router/env.(*networkControllers).connectToControllerWithBackoff.func3","level":"info","msg":"successfully connected to controller","time":"2025-04-11T12:01:50.612Z"}
But i see the following which suggests it's unable to connect to ziti-controller-1
.
Apr 11 12:04:18 ziti-router-2 ziti[1631]: {"endpoint":"tls:ziti-controller-1.az.lifeboat.ziti:8443","error":"error connecting ctrl (EOF)","file":"github.com/openziti/ziti/router/env/ctrls.go:192","func":"github.com/openziti/ziti/router/env.(*networkControllers).connectToControllerWithBackoff.func2","level":"error","msg":"unable to connect controller","time":"2025-04-11T12:04:18.639Z"}
In the ziti-controller-1
logs i see the following log repeating.
Apr 11 12:04:58 ziti-controller-1 ziti[3445]: {"_context":"tls:0.0.0.0:8443","file":"github.com/openziti/channel/v4@v4.0.4/classic_listener.go:213","func":"github.com/openziti/channel/v4.(*classicListener).acceptConnection.func1","level":"error","msg":"connection handler error for [tls:10.128.6.4:34640] (x509: certificate signed by unknown authority)","time":"2025-04-11T12:04:58.921Z"}
Now when i run ziti edge list edge-routers
on either Controller. I see the ONLINE
state is false
on ziti-controller-1
and true
on ziti-controller-2
.
From reading the Changelog i can't tell what might have caused this issue for me or if i need to change my install process. Currently i don't experience this issue if i run v1.5.0 but i do experience it on v1.5.4 and v.1.6.0.
Thanks in advance !