Environment
OpenZiti version
-
Controller:
1.1.15 -
Routers:
1.1.15
Platform
-
Google Kubernetes Engine (GKE)
-
Node type:
e2-standard-2 -
Controller and router are running as separate pods (same node in some cases)
Scale
-
~200 identities
-
Multiple services and terminators
Problem Summary
We are seeing intermittent control-plane disconnections between routers and the controller.
At the time of the incident:
-
routers lose the controller connection
-
they reconnect automatically after ~1–2 minutes
-
during this window, clients experience service failures and the routers log:
service <id> has no online terminators
and
unable to ping (use of closed network connection)
rx error. closed peer and starting reconnection process
EOF
{"_context":"ch{edge}-\u003eu{classic}-\u003ei{QEmR}","chSeq":23175,"connId":126,"edgeSeq":0,"error":"service 3Zwpeo9QSKNGDkiqYLI6MX has no online terminators for instanceId ","file":"github.com/openziti/ziti/router/xgress_edge/listener.go:199","func":"github.com/openziti/ziti/router/xgress_edge.(*edgeClientConn).processConnect","level":"warning","msg":"failed to dial fabric","time":"2026-01-30T14:53:00.339Z","token":"2f6ccfa7-102d-4d57-86e4-3efff6910def","type":"EdgeConnectType"}
This happens even though:
-
CPU usage is low
-
memory usage is low
-
node conntrack usage is low
-
node file descriptor usage is low
We can provide Grafana metrics if needed.
Observed Router Logs
Example log lines from router:
_context":"u{reconnecting}->i{KMeD} @tls:ziti-ctrl.zzz.zzz:443
msg":"unable to ping (use of closed network connection)"
msg":"rx error. closed peer and starting reconnection process"
error":"EOF"
After this, routers reconnect and services recover.
At the same time, routers log:
failed to dial fabric
service <service-id> has no online terminators
{"_context":"ch{edge}-\u003eu{classic}-\u003ei{QEmR}","chSeq":23175,"connId":126,"edgeSeq":0,"error":"service 3Zwpeo9QSKNGDkiqYLI6MX has no online terminators for instanceId ","file":"github.com/openziti/ziti/router/xgress_edge/listener.go:199","func":"github.com/openziti/ziti/router/xgress_edge.(*edgeClientConn).processConnect","level":"warning","msg":"failed to dial fabric","time":"2026-01-30T14:53:00.339Z","token":"2f6ccfa7-102d-4d57-86e4-3efff6910def","type":"EdgeConnectType"}
Controller Connectivity
Routers connect to the controller using:
tls:ziti-ctrl.zzz.zzz:443
Internally the controller listens on:
tls:0.0.0.0:1280
Helm Values Used for Controller
# /tmp/controller-values.yml
ctrlPlane:
advertisedHost: ziti-ctrl.zzz.zzz
advertisedPort: 443
service:
type: ClusterIP
ingress:
enabled: true
ingressClassName: nginx
annotations:
kubernetes.io/ingress.allow-http: "false"
nginx.ingress.kubernetes.io/ssl-passthrough: "true"
nginx.ingress.kubernetes.io/secure-backends: "true"
clientApi:
advertisedHost: ziti-controller.zzzz.zzz
advertisedPort: 443
service:
type: ClusterIP
ingress:
enabled: true
ingressClassName: nginx
annotations:
kubernetes.io/ingress.allow-http: "false"
nginx.ingress.kubernetes.io/ssl-passthrough: "true"
nginx.ingress.kubernetes.io/secure-backends: "true"
Ingress is backed by nginx ingress controller with SSL passthrough.
Important Observations
-
The router is not crashing.
-
The controller is not crashing.
-
The connection is being closed and re-established.
-
Multiple services lose terminators at the same time.
-
The issue is reproducible and visible in metrics and logs.
Questions / Clarification Requested
-
Could running controller and router on the same node (but different pods) contribute to this behavior, or is that unrelated?
-
Are there any known issues or recommended settings for running OpenZiti 1.1.15 control-plane traffic behind an ingress?
-
Do you recommedn to run controller and roiuter on seperate node not on same node and use little higher configuration? does e2-standard-2 is recommeded?
Additional Context
-
We are running ~200 identities.
-
We do not observe CPU, memory, conntrack, or file descriptor pressure at the node level.
-
We can provide Grafana screenshots and Kubernetes events if needed.