Controller And Router Upgrades Causing Connectivity Issues

When upgrading from controller version 1.6.5 to 1.7.0 and router version 1.6.7 to 1.7.0, I am having connectivity issues.

Controller logs:

{"file":"github.com/openziti/ziti/controller/network/fault.go:32","func":"github.com/openziti/ziti/controller/network.(*Network).fault","level":"info","msg":"network fault processing for [6] circuits","time":"2025-10-23T19:32:15.492Z"}
{"circuitId":"6HynIqYeUWXTHKGu3bRcUf","file":"github.com/openziti/ziti/controller/network/fault.go:49","func":"github.com/openziti/ziti/controller/network.(*Network).fault","level":"info","msg":"sent unroute for circuit to router in response to forwarding fault","routerId":"GRytZdpmX","time":"2025-10-23T19:32:15.492Z"}
{"circuitId":"7BWxTj0pCQikPYhFGwqIS6","file":"github.com/openziti/ziti/controller/network/fault.go:49","func":"github.com/openziti/ziti/controller/network.(*Network).fault","level":"info","msg":"sent unroute for circuit to router in response to forwarding fault","routerId":"GRytZdpmX","time":"2025-10-23T19:32:15.492Z"}
{"circuitId":"5ZYGSsHtJNuBydhGAtTpg8","file":"github.com/openziti/ziti/controller/network/fault.go:49","func":"github.com/openziti/ziti/controller/network.(*Network).fault","level":"info","msg":"sent unroute for circuit to router in response to forwarding fault","routerId":"GRytZdpmX","time":"2025-10-23T19:32:15.492Z"}
{"circuitId":"5AJgepJeLAkR8ypIX7L5nf","file":"github.com/openziti/ziti/controller/network/fault.go:49","func":"github.com/openziti/ziti/controller/network.(*Network).fault","level":"info","msg":"sent unroute for circuit to router in response to forwarding fault","routerId":"GRytZdpmX","time":"2025-10-23T19:32:15.492Z"}
{"circuitId":"28oEsr7TDQE5NIH8w7TYh0","file":"github.com/openziti/ziti/controller/network/fault.go:49","func":"github.com/openziti/ziti/controller/network.(*Network).fault","level":"info","msg":"sent unroute for circuit to router in response to forwarding fault","routerId":"GRytZdpmX","time":"2025-10-23T19:32:15.492Z"}
{"circuitId":"5s0m1cXxxx7lvLXiGHcv0R","file":"github.com/openziti/ziti/controller/network/fault.go:49","func":"github.com/openziti/ziti/controller/network.(*Network).fault","level":"info","msg":"sent unroute for circuit to router in response to forwarding fault","routerId":"GRytZdpmX","time":"2025-10-23T19:32:15.492Z"}

Router logs:

{"circuitId":"5aVEMZZrRbZSNYI6Ezqer","file":"github.com/openziti/ziti/router/forwarder/forwarder.go:155","func":"github.com/openziti/ziti/router/forwarder.(*Forwarder).Unroute","level":"info","msg":"circuit unrouted","time":"2025-10-23T19:32:30.493Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:32:31.349Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:32:36.176Z"}
{"_context":"{c/7JynKoUEEgvME1EhFT2ui3|@/3VHnC3hERc6JF8l8KILhKe}\u003cTerminator\u003e","file":"github.com/openziti/sdk-golang@v1.2.10/xgress/xgress.go:699","func":"github.com/openziti/sdk-golang/xgress.(*Xgress).rx","level":"warning","msg":"read failed (read tcp 10.0.0.175:56698-\u003etraefik.ip.address.here:443: use of closed network connection)","time":"2025-10-23T19:32:39.675Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:32:41.348Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:32:41.448Z"}
{"_context":"{c/7Dw8HnmogPQUwK97LsTdqM|@/2LqGtuwWZVNwMV3qAAFeR3}\u003cTerminator\u003e","file":"github.com/openziti/sdk-golang@v1.2.10/xgress/link_send_buffer.go:296","func":"github.com/openziti/sdk-golang/xgress.(*LinkSendBuffer).run","level":"warning","msg":"closing while buffer contains unacked payloads","payloadCount":1,"time":"2025-10-23T19:32:41.448Z"}
{"circuitCount":5,"ctrlId":"ziti-controller-ctrl-plane-client-identity","file":"github.com/openziti/ziti/router/forwarder/faulter.go:107","func":"github.com/openziti/ziti/router/forwarder.(*Faulter).run","level":"warning","msg":"reported forwarding faults","time":"2025-10-23T19:32:45.493Z"}
{"circuitId":"7JynKoUEEgvME1EhFT2ui3","file":"github.com/openziti/ziti/router/forwarder/forwarder.go:155","func":"github.com/openziti/ziti/router/forwarder.(*Forwarder).Unroute","level":"info","msg":"circuit unrouted","time":"2025-10-23T19:32:45.494Z"}
{"circuitId":"7FhcQQSqtCjz5P8vCGBhWV","file":"github.com/openziti/ziti/router/forwarder/forwarder.go:155","func":"github.com/openziti/ziti/router/forwarder.(*Forwarder).Unroute","level":"info","msg":"circuit unrouted","time":"2025-10-23T19:32:45.494Z"}
{"circuitId":"4lpP8Dkpfh0aVaL02P7FJd","file":"github.com/openziti/ziti/router/forwarder/forwarder.go:155","func":"github.com/openziti/ziti/router/forwarder.(*Forwarder).Unroute","level":"info","msg":"circuit unrouted","time":"2025-10-23T19:32:45.494Z"}
{"circuitId":"7Dw8HnmogPQUwK97LsTdqM","file":"github.com/openziti/ziti/router/forwarder/forwarder.go:155","func":"github.com/openziti/ziti/router/forwarder.(*Forwarder).Unroute","level":"info","msg":"circuit unrouted","time":"2025-10-23T19:32:45.494Z"}
{"circuitId":"7eK988kJjg7mbK2U7wyx7t","file":"github.com/openziti/ziti/router/forwarder/forwarder.go:155","func":"github.com/openziti/ziti/router/forwarder.(*Forwarder).Unroute","level":"info","msg":"circuit unrouted","time":"2025-10-23T19:32:45.494Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:32:51.367Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:32:58.572Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:32:58.580Z"}
{"circuitId":"766Q4z40LvqRzWa4vFUYFb","ctrlId":"ziti-controller-ctrl-plane-client-identity","file":"github.com/openziti/ziti/router/forwarder/scanner.go:85","func":"github.com/openziti/ziti/router/forwarder.(*Scanner).scan","idleThreshold":60000000000,"idleTime":183171000000,"level":"warning","msg":"circuit exceeds idle threshold","time":"2025-10-23T19:33:00.421Z"}
{"circuitCount":1,"ctrlId":"ziti-controller-ctrl-plane-client-identity","file":"github.com/openziti/ziti/router/forwarder/scanner.go:100","func":"github.com/openziti/ziti/router/forwarder.(*Scanner).scan","level":"warning","msg":"sent confirmation for circuits","time":"2025-10-23T19:33:00.421Z"}
{"circuitCount":3,"ctrlId":"ziti-controller-ctrl-plane-client-identity","file":"github.com/openziti/ziti/router/forwarder/faulter.go:107","func":"github.com/openziti/ziti/router/forwarder.(*Faulter).run","level":"warning","msg":"reported forwarding faults","time":"2025-10-23T19:33:00.493Z"}
{"circuitId":"7gF1dVDq7KQw0mvF6T8O38","file":"github.com/openziti/ziti/router/forwarder/forwarder.go:155","func":"github.com/openziti/ziti/router/forwarder.(*Forwarder).Unroute","level":"info","msg":"circuit unrouted","time":"2025-10-23T19:33:00.494Z"}
{"circuitId":"5OW8imUqDZXaVXTW0969N0","file":"github.com/openziti/ziti/router/forwarder/forwarder.go:155","func":"github.com/openziti/ziti/router/forwarder.(*Forwarder).Unroute","level":"info","msg":"circuit unrouted","time":"2025-10-23T19:33:00.494Z"}
{"circuitId":"1zkX0g4u4xOrUdoavIoKcQ","file":"github.com/openziti/ziti/router/forwarder/forwarder.go:155","func":"github.com/openziti/ziti/router/forwarder.(*Forwarder).Unroute","level":"info","msg":"circuit unrouted","time":"2025-10-23T19:33:00.494Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:33:01.349Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:33:01.742Z"}

I also noticed these in the controller logs:

{"_context":"ch{GRytZdpmX}-\u003eu{classic}-\u003ei{GRytZdpmX/jwkj}","error":"invalid session","file":"github.com/openziti/ziti/controller/handler_edge_ctrl/common.go:83","func":"github.com/openziti/ziti/controller/handler_edge_ctrl.(*baseRequestHandler).returnError","level":"error","msg":"responded with error","operation":"create.circuit","routerId":"GRytZdpmX","time":"2025-10-23T19:39:19.074Z","token":"eyJhbGciOiJSUzI1NiIsImtpZCI6ImY2NzJhMDFlN2IwMzhiNDI2MmJjODNlZWJiNjNlOWMzN2Q4MGVlNTQiLCJ0eXAiOiJKV1QifQ.eyJpc3MiOiJodHRwczovL3ppdGktY29udHJvbGxlci5pbXBlcmZla3R1cy5jb206NDQzIiwic3ViIjoiNk1qekw5cGVXNkVhbnNqM2RFY1NObyIsImF1ZCI6WyJvcGVueml0aSJdLCJleHAiOjE3OTI3ODQzNDQsImlhdCI6MTc2MTI0ODM0NCwianRpIjoiY21oM3RzZnZhMDIwYjBkOTJxMzNwdm83dCIsInpfYXNpZCI6ImNtaDN0czVubjAxenQwZDkydHoyczlpeHgiLCJ6X2lpZCI6IlZpQUhzaUtNdCIsInpfdCI6InMiLCJ6X3N0IjoiRGlhbCIsInpfbGVnIjp0cnVlfQ.CeUwzpPHNe9m0O4zcWrmKdDHaTq97glyrHSyrx5kQgBS1YLJxYkXyfswGIMBqaT2-N4UYE0wLqxXCH05USd2ru0gZ4d0Dnkpraymm1zCDJvyI_zPjypGdvl1pwoGg3iwosnGN342B-4UJ-VGuqJjb_-TQeyshjaT8fs-ba04rWvTLqnRjbdJWB167pUyxUMjFkRQfYXlTazX_p69CIdGWgD9wOgo2C_Fuysf4JOjLKRBjDHdlWc-3mLTwZjh8_kWjSiOPSzncu3kAJmTATxbaZpsh6QD7cNB_glZfRx6WKNrTPzGVlWsw9noj_LjLIaFXJf2rjt8RkyUir5tVqxUK6Lki2vtsDJIrzDtbqCe6mwLFJYufxXLcMWP4Bfiz1mM1IrZLkPeAD-ygiUge8QZ94Kagn1eR8jvcH3T7F8LIhk_5G8WF8pK9636Tavh82qpE7mcbSSC7cjkYpoL7w797eU9TSOdsdS6WlCptV_aHVfFKtZZePUOPylbL_YyuevRe8qwWWDP4O6kVKfma-YOcQTEn9My9nhAV13wQyk1egqEERaATblGC6uLKPazOzJwzWrk-WPm5NRIVCNnWTL1llyBoSlDV-_D11-cvaldYLeWMS84HFxeQuMPoVXxxbPe0miOuj02t0vaUyoyK0-VV-O5haghbtbTOip_som8KRc"}

And these in the router logs:

{"_context":"{c/4hZ2xqQVsoOtp8PEBFmoP|@/5vbcD7Xg6bWgc52iGVtWEC}\u003cTerminator\u003e","file":"github.com/openziti/sdk-golang@v1.2.10/xgress/xgress.go:699","func":"github.com/openziti/sdk-golang/xgress.(*Xgress).rx","level":"warning","msg":"read failed (read tcp 10.0.0.175:38834-\u003etraefik.ip.address.here:443: use of closed network connection)","time":"2025-10-23T19:39:09.372Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:39:11.362Z"}
{"_context":"{c/OFq8jjC4g0CR6peOWMlrD|@/4rMXMBt12IqMyGL6pDZf3i}\u003cTerminator\u003e","file":"github.com/openziti/sdk-golang@v1.2.10/xgress/link_send_buffer.go:296","func":"github.com/openziti/sdk-golang/xgress.(*LinkSendBuffer).run","level":"warning","msg":"closing while buffer contains unacked payloads","payloadCount":1,"time":"2025-10-23T19:39:11.363Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:39:11.737Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:39:11.744Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:39:11.748Z"}
{"circuitCount":7,"ctrlId":"ziti-controller-ctrl-plane-client-identity","file":"github.com/openziti/ziti/router/forwarder/faulter.go:107","func":"github.com/openziti/ziti/router/forwarder.(*Faulter).run","level":"warning","msg":"reported forwarding faults","time":"2025-10-23T19:39:15.509Z"}
{"circuitId":"4hZ2xqQVsoOtp8PEBFmoP","file":"github.com/openziti/ziti/router/forwarder/forwarder.go:155","func":"github.com/openziti/ziti/router/forwarder.(*Forwarder).Unroute","level":"info","msg":"circuit unrouted","time":"2025-10-23T19:39:15.510Z"}
{"circuitId":"5R0XFrzS2uPzTymwKP4fUG","file":"github.com/openziti/ziti/router/forwarder/forwarder.go:155","func":"github.com/openziti/ziti/router/forwarder.(*Forwarder).Unroute","level":"info","msg":"circuit unrouted","time":"2025-10-23T19:39:15.510Z"}
{"circuitId":"OFq8jjC4g0CR6peOWMlrD","file":"github.com/openziti/ziti/router/forwarder/forwarder.go:155","func":"github.com/openziti/ziti/router/forwarder.(*Forwarder).Unroute","level":"info","msg":"circuit unrouted","time":"2025-10-23T19:39:15.511Z"}
{"circuitId":"7geKMC3pn3Wcg2KeGlgnwg","file":"github.com/openziti/ziti/router/forwarder/forwarder.go:155","func":"github.com/openziti/ziti/router/forwarder.(*Forwarder).Unroute","level":"info","msg":"circuit unrouted","time":"2025-10-23T19:39:15.511Z"}
{"circuitId":"ePHSTmEPwVN1DTKdUJ3ww","file":"github.com/openziti/ziti/router/forwarder/forwarder.go:155","func":"github.com/openziti/ziti/router/forwarder.(*Forwarder).Unroute","level":"info","msg":"circuit unrouted","time":"2025-10-23T19:39:15.511Z"}
{"circuitId":"M1xni31tl6lWQ0bvvtDn9","file":"github.com/openziti/ziti/router/forwarder/forwarder.go:155","func":"github.com/openziti/ziti/router/forwarder.(*Forwarder).Unroute","level":"info","msg":"circuit unrouted","time":"2025-10-23T19:39:15.511Z"}
{"circuitId":"1fKLvnZ0jxbDIvcb4znoQk","file":"github.com/openziti/ziti/router/forwarder/forwarder.go:155","func":"github.com/openziti/ziti/router/forwarder.(*Forwarder).Unroute","level":"info","msg":"circuit unrouted","time":"2025-10-23T19:39:15.511Z"}
{"_context":"ch{edge}-\u003eu{classic}-\u003ei{ziti-sdk-c[0]@localhost/q98K}","chSeq":13,"connId":3,"edgeSeq":0,"error":"invalid session","file":"github.com/openziti/ziti/router/xgress_edge/listener.go:1339","func":"github.com/openziti/ziti/router/xgress_edge.(*nonXgConnectHandler).FinishConnect","level":"warning","msg":"failed to dial fabric","time":"2025-10-23T19:39:19.075Z","type":"EdgeConnectType"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:39:20.231Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:39:20.234Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:39:20.258Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:39:20.267Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:39:20.272Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:39:20.283Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:39:20.300Z"}
{"error":null,"file":"github.com/openziti/ziti/router/xgress_common/connection.go:222","func":"github.com/openziti/ziti/router/xgress_common.(*XgressConn).ReadPayload","level":"error","msg":"failed to set write deadline, connection is fully closed","time":"2025-10-23T19:39:21.350Z"}

@qrkourier Is this a known bug in version 1.7.0?

Hi, thank you for reporting this.

When you say connectivity issues, can you narrow it down to one of:

  1. Connections failing to establish
  2. Connections being torn down before the clients expect it

or is it both?

I'm suspicious of the invalid session message, as there were some changes related to session handling in 1.7.0. Some of the other messages are likely things that could be turned down from error to debug.

However, the invalid session message would only apply to failure when trying to establish a new connection. So if we can narrow it down, that would be very helpful.

Thank you,
Paul

Absolutely!

As far as outcome 1 or 2, it seems like it may be both, but it is difficult to tell.

I can successfully connect to ziti with a tunnel, but resolving services fails.
The service resolution also is not consistent. On android, if I kill the mobile edge, it seems to resolve, and then will start to fail after a bit of use.

I just downgraded both the controller and the router (restoring from a working database), which restored functionality.
Curiously, there are still tonnes of sent unroute for circuit to router in response to forwarding fault messages in the controller logs. In the router, the logs have a bunch of cannot forward payload, no destination for circuit messages.

It seems that these error messages have been there for a while, but I was unaware of them.
The curious thing is that the previous version pair of controller and router function properly, but still show errors in the logs, while the most recent 1.7.0 releases break functionality.

On a related note, I wanted to ask about the protocol for helm chart releases and stable versions.

Currently, the helm release for both the controller and the router use a "pre-release" version of 1.7.0 according to github releases.

Is that normal procedure?

Yes, I have habitually included the latest prerelease from upstream Ziti in the latest Helm Charts as long as they passed the reverse compatibility testing matrix here: helm-charts/.github/workflows/miniziti.yml at main · openziti/helm-charts · GitHub

I'm open to feedback on that practice. We could adopt upstream Ziti's notion of "prerelease" and "stable" into the Helm charts' life cycle, but it's not the only way for a consumer to adopt that notion.

You could, for example, deviate from the tight coupling of chart and Ziti versions by expressing a Helm version constraint that ensures proven reverse compatibility, a la semver, like --version='^2.0.0' while also overriding the image.tag with the preferred Ziti version.

You didn’t happen to delete a service after something started using the service did you? Specifically a service with the id 6MjzL9peW6Eansj3dEcSNo.

Thank you for the information!

I just wanted to make sure that it was not running unstable versions.

I am not certain.
Where did you get that service ID from? I do not see that in the logs.

Is there a straightforward way to debug errors like these:

Controller:

{"circuitId":"2TKNFx8QKw3eBiiREeBzGC","file":"github.com/openziti/ziti/controller/network/fault.go:49","func":"github.com/openziti/ziti/controller/network.(*Network).fault","level":"info","msg":"sent unroute for circuit to router in response to forwarding fault","routerId":"GRytZdpmX","time":"2025-10-27T16:22:36.868Z"}

Router:

{"_context":"{c/5EuwYhsTwb5hKqNMm2wEI5|@/26fhTILBkwU3B7S74851L1}\u003cInitiator\u003e","circuitId":"5EuwYhsTwb5hKqNMm2wEI5","error":"cannot forward payload, no destination for circuit=5EuwYhsTwb5hKqNMm2wEI5 src=26fhTILBkwU3B7S74851L1 dst=n6aNLBb5I6BQHf3M3pcIn","file":"github.com/openziti/ziti/router/handler_xgress/data_plane.go:58","func":"github.com/openziti/ziti/router/handler_xgress.(*dataPlaneAdapter).ForwardPayload","level":"error","msg":"unable to forward payload","origin":0,"seq":7,"time":"2025-10-27T16:21:23.276Z"}

Hi @All,

I'm chiming in here as I'm experiencing similar issues. After the Testflight deployed Desktop-Client (from last weekend) refused to work with my quite old 1.5.x controllers, I thought it’s time to update :wink:

I have updated to version 1.7.1, hoping it would resolve the issue, but it didn't. The main issue I'm seeing is the invalid session error.

So far, I have only updated my personal and staging environments, so my OSX Desktop Edge is still on version 2.52 (551) - the officially released client, not the Testflight version.

I’ve noticed that I have both working and failing connections between the same edges. So I started digging into it, especially from the client side.
Here is a failing connection:

(26450)[2025-10-28T11:40:41.669Z] VERBOSE tunnel-cbs:ziti_tunnel_cbs.c:287 ziti_sdk_c_dial() ziti_dial(name=iac-hetzner02_mailng__t-[red.act.ed])
(26450)[2025-10-28T11:40:41.669Z]   DEBUG tunnel-cbs:ziti_tunnel_cbs.c:354 ziti_sdk_c_dial() service[iac-hetzner02_mailng__t-[red.act.ed]] app_data_json[174]='{"connType":null,"dst_protocol":"tcp","dst_hostname":"[red.act.ed]","dst_ip":"100.64.0.10","dst_port":"993","src_protocol":"tcp","src_ip":"100.64.0.1","src_port":"62094"}'
(26450)[2025-10-28T11:40:41.669Z] VERBOSE ziti-sdk:connect.c:129 conn_set_state() conn[3.357/cUX46iWx/Initial](iac-hetzner02_mailng__t-[red.act.ed]) transitioning Initial => Connecting
(26450)[2025-10-28T11:40:41.669Z]   DEBUG ziti-sdk:connect.c:431 connect_get_service_cb() conn[3.357/cUX46iWx/Connecting](iac-hetzner02_mailng__t-[red.act.ed]) got service[iac-hetzner02_mailng__t-[red.act.ed]] id[7BEMDVBfaAeqPQO32kfHe3]
(26450)[2025-10-28T11:40:41.669Z]   DEBUG ziti-sdk:connect.c:550 process_connect() conn[3.357/cUX46iWx/Connecting](iac-hetzner02_mailng__t-[red.act.ed]) starting Dial connection for service[iac-hetzner02_mailng__t-[red.act.ed]] with session[cmhagmzui1uoz0d79fdvhl1vn]
(26450)[2025-10-28T11:40:41.669Z]   DEBUG ziti-sdk:connect.c:409 ziti_connect() conn[3.357/cUX46iWx/Connecting](iac-hetzner02_mailng__t-[red.act.ed]) selected ch[core-router@tls://edge.kis.sdn.freaks.de:443] for best latency(4294967321 ms)
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:1123 ziti_channel_start_connection() conn[3.357/cUX46iWx/Connecting](iac-hetzner02_mailng__t-[red.act.ed]) ch[0] => Edge Connect request token[eyJhbGciOiJSUz[redacted]p0sKZPE
(26450)[2025-10-28T11:40:41.669Z]   ERROR ziti-sdk:connect.c:1070 connect_reply_cb() conn[3.357/cUX46iWx/Connecting](iac-hetzner02_mailng__t-[red.act.ed]) failed to connect, reason=invalid session
(26450)[2025-10-28T11:40:41.669Z] VERBOSE ziti-sdk:connect.c:129 conn_set_state() conn[3.357/cUX46iWx/Connecting](iac-hetzner02_mailng__t-[red.act.ed]) transitioning Connecting => Disconnected
(26450)[2025-10-28T11:40:41.669Z]   DEBUG ziti-sdk:connect.c:323 complete_conn_req() conn[3.357/cUX46iWx/Disconnected](iac-hetzner02_mailng__t-[red.act.ed]) Disconnected failed: connection is closed
(26450)[2025-10-28T11:40:41.669Z] VERBOSE ziti-sdk:connect.c:129 conn_set_state() conn[3.357/cUX46iWx/Disconnected](iac-hetzner02_mailng__t-[red.act.ed]) transitioning Disconnected => Disconnected
(26450)[2025-10-28T11:40:41.669Z] VERBOSE ziti-sdk:connect.c:129 conn_set_state() conn[3.357/cUX46iWx/Disconnected](iac-hetzner02_mailng__t-[red.act.ed]) transitioning Disconnected => Closed
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:810 flush_connection() conn[3.357/cUX46iWx/Closed](iac-hetzner02_mailng__t-[red.act.ed]) starting flusher
(26450)[2025-10-28T11:40:41.669Z]   DEBUG ziti-sdk:connect.c:893 flush_to_client() conn[3.357/cUX46iWx/Closed](iac-hetzner02_mailng__t-[red.act.ed]) no data_cb: can't flush, 0 bytes available
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:879 flush_to_service() conn[3.357/cUX46iWx/Closed](iac-hetzner02_mailng__t-[red.act.ed]) flushed 0 messages
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:803 on_flush() conn[3.357/cUX46iWx/Closed](iac-hetzner02_mailng__t-[red.act.ed]) stopping flusher
(26450)[2025-10-28T11:40:41.669Z]   DEBUG ziti-sdk:connect.c:184 close_conn_internal() conn[3.357/cUX46iWx/Closed](iac-hetzner02_mailng__t-[red.act.ed]) removing
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:225 close_conn_internal() conn[3.357/cUX46iWx/Closed](iac-hetzner02_mailng__t-[red.act.ed]) is being free()'d

And a working connection:

(26450)[2025-10-28T11:40:41.669Z] VERBOSE tunnel-cbs:ziti_tunnel_cbs.c:287 ziti_sdk_c_dial() ziti_dial(name=iac-hetzner02_int_services_80,443_t-5)
(26450)[2025-10-28T11:40:41.669Z]   DEBUG tunnel-cbs:ziti_tunnel_cbs.c:354 ziti_sdk_c_dial() service[iac-hetzner02_int_services_80,443_t-5] app_data_json[182]='{"connType":null,"dst_protocol":"tcp","dst_hostname":"red-admin.act.ed","dst_ip":"100.64.2.63","dst_port":"443","src_protocol":"tcp","src_ip":"100.64.0.1","src_port":"62149"}'
(26450)[2025-10-28T11:40:41.669Z] VERBOSE ziti-sdk:connect.c:129 conn_set_state() conn[3.358/H5b621cC/Initial](iac-hetzner02_int_services_80,443_t-5) transitioning Initial => Connecting
(26450)[2025-10-28T11:40:41.669Z]   DEBUG ziti-sdk:connect.c:431 connect_get_service_cb() conn[3.358/H5b621cC/Connecting](iac-hetzner02_int_services_80,443_t-5) got service[iac-hetzner02_int_services_80,443_t-5] id[7MAzNSWe9DQTx0seu4vHpF]
(26450)[2025-10-28T11:40:41.669Z]   DEBUG ziti-sdk:connect.c:550 process_connect() conn[3.358/H5b621cC/Connecting](iac-hetzner02_int_services_80,443_t-5) starting Dial connection for service[iac-hetzner02_int_services_80,443_t-5] with session[cmhalgzb720zc0d79mroz2bwu]
(26450)[2025-10-28T11:40:41.669Z]   DEBUG ziti-sdk:connect.c:409 ziti_connect() conn[3.358/H5b621cC/Connecting](iac-hetzner02_int_services_80,443_t-5) selected ch[core-router@tls://edge.kis.sdn.[red.acted]:443] for best latency(4294967301 ms)
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:1123 ziti_channel_start_connection() conn[3.358/H5b621cC/Connecting](iac-hetzner02_int_services_80,443_t-5) ch[0] => Edge Connect request token[eyJhbGciOiJSUzp[redacted]Atu1XSGqu_
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:1078 connect_reply_cb() conn[3.358/H5b621cC/Connecting](iac-hetzner02_int_services_80,443_t-5) connected
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:810 flush_connection() conn[3.358/H5b621cC/Connecting](iac-hetzner02_int_services_80,443_t-5) starting flusher
(26450)[2025-10-28T11:40:41.669Z] VERBOSE ziti-sdk:connect.c:129 conn_set_state() conn[3.358/H5b621cC/Connecting](iac-hetzner02_int_services_80,443_t-5) transitioning Connecting => Connected
(26450)[2025-10-28T11:40:41.669Z] VERBOSE ziti-sdk:connect.c:897 flush_to_client() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) 0 bytes available
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:313 send_message() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) => ct[DataType] uuid[966c9262:00000000:896adeec] edge_seq[0] len[24] hash[966c9262:c03cb036:be692ac7:01ff1f48:ae91218e:69a69ce3:1184828b:66ebc4fe]
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:240 on_write_completed() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) status 0
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:879 flush_to_service() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) flushed 1 messages
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:803 on_flush() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) stopping flusher
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:1300 ziti_write() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) write 334 bytes
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:810 flush_connection() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) starting flusher
(26450)[2025-10-28T11:40:41.669Z] VERBOSE ziti-sdk:connect.c:897 flush_to_client() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) 0 bytes available
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:313 send_message() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) => ct[DataType] uuid[9a5a3790:00000001:896adeed] edge_seq[1] len[351] hash[9a5a3790:8baa5713:e8e1dd52:0193646d:12d10b3e:b583b784:434339a7:a9ab948a]
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:240 on_write_completed() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) status 0
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:879 flush_to_service() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) flushed 1 messages
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:803 on_flush() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) stopping flusher
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:810 flush_connection() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) starting flusher
(26450)[2025-10-28T11:40:41.669Z]   DEBUG ziti-sdk:connect.c:1433 process_edge_message() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) peer capability: stream[N] multipart[Y] trace[Y]
(26450)[2025-10-28T11:40:41.669Z]   TRACE ziti-sdk:connect.c:1442 process_edge_message() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) <= ct[DataType] uuid[55039008:00000000:1161c08609] edge_seq[-1] len[24] 
(26450)[2025-10-28T11:40:41.669Z] VERBOSE ziti-sdk:connect.c:950 conn_inbound_data_msg() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) processing crypto header(24 bytes)
(26450)[2025-10-28T11:40:41.669Z] VERBOSE ziti-sdk:connect.c:953 conn_inbound_data_msg() conn[3.358/H5b621cC/Connected](iac-hetzner02_int_services_80,443_t-5) processed crypto header

Interestingly, the client uses different JWT tokens for the requests.
The JWT token used for the failed request had obviously expired (it expired more than two hours ago – I'm in the UTC+1 time zone).

Are there any known issues with using expired tokens on the client side?

Hi @thedarkula and @marvkis

We've been digging into this and have some follow up questions, to try and narrow things down. For both of you:

  1. How many controllers are you running in your network?
  2. Is OIDC auth enabled on your controller(s)?
  3. How long does it take for the issue to start cropping up? Does it happen immediately, or does it only seem to happen after some time has passed?
  4. What is your API Session timeout set to in the controller configuration?
  5. For the services that are failing, what are the access patterns? Are they being accessed once a minute/once an hour/etc? Is there a chance the api session may be expiring between service dials?
  6. With which clients are you seeing the issue (app/version/os)?

For @marvkis specifically:

  1. You said the session was expired. How did you determine this? Did you look at the JWT contents? If yes, what was the value of the z_t field in the JWT?

I did run an hour long circuit intensive test yesterday, to see if I could reproduce the issue, but I did not see anything. Hopefully we'll have more luck once we know more about your setups.

Thank you!
Paul

Hi @plorenz ,

Thanks for digging into it!

Just 1 controller and 1 router (I simple 'private' setup)

Ther is an OIDC config, but I've disabled it. It did some browzer-tests, it's a left-over.

It happens after some time. Maybe after Connectivity-Loss of my client (Ziti disconnects/reconnects). When I stop/restart the ZITI client it works again.

Where can I get it from? I don't think I've changed something here.

It just happened for my ziti admin API while trying to find out the session timeout via ZAC :wink:
The last time it was for my mail server - The mail client is keeping a connection open all time...

The client I'm using is OSX with Ziti Desktop 2.52 (551)

exactly :wink: . This is the JWT Payload from the failing session with my management-api:

{
  "iss": "https://client.sdn.[red.act.ed]:443",
  "sub": "3pUK8UHG1hK48Alur9o5LK",
  "aud": [
    "openziti"
  ],
  "exp": 1793211836,
  "iat": 1761675836,
  "jti": "cmhawb36h2f2s0d79l1ilx30i",
  "z_asid": "cmhaummvm2cvp0d7949jsos2w",
  "z_iid": "zeaDVHLkD",
  "z_t": "s",
  "z_st": "Dial",
  "z_leg": true
}

Hint: the setup in which I am experiencing the issue is running version 1.7.1. However, my client has three identities: two with version 1.7.1 and one with version 1.5.x. Could this be confusing the client?

Thanks!
chris

Thanks for all the debugging information, it's very helpful!

The api session timeout is in the controller doc: Controller Configuration Reference | NetFoundry Documentation

edge:
  api:
    sessionTimeout: 30m

Cheers,
Paul

The service id is in the JWT token, which was in the logs.

I need to drop these down to at most INFO messages, as usually they're not errors. Often there are race conditions when tearing circuits down, and a destination may be torn down before the EOF payload arrives. I've left them at error till now because they can represent an actual problem, but the vast majority of the time they're harmless.

Paul

Hi Paul,

i've just double checked the Session Timeout config:

edge:
  # This section represents the configuration of the Edge API that is served over HTTPS
  api:
    #(optional, default 90s) Alters how frequently heartbeat and last activity values are persisted
    # activityUpdateInterval: 90s
    #(optional, default 250) The number of API Sessions updated for last activity per transaction
    # activityUpdateBatchSize: 250
    # sessionTimeout - optional, default 30m
    # The number of minutes before an Edge API session will time out. Timeouts are reset by
    # API requests and connections that are maintained to Edge Routers
    sessionTimeout: 30m

It seems to be the default value in my setup.

Bye,
Chris

Hi @marvkis

We haven't been able to reproduce the issue here yet. We still have a few things to try. One thing I was curious about was if the problem was specific to non-OIDC authentication. I saw you were trying OIDC, over in this thread: Edge-oidc URL order

Did you notice if the problem went away with the versions of the tunneler that were generally working (1.7.14)? @ekoby is working on a fix for the issue you reported, but in the meantime if we can narrow the problem scope down some more, that would be helpful.

Thank you,
Paul

Hi @plorenz,

Sorry, didn't notice you post yesterday.

I just tested the new TestFlight-Version (2.53 / 557) and it also fails with the same messages I see in Edge-oidc URL order . So I had the same Idea: The Session-Refresh failure might be related somehow to the OIDC issue.

Maybe i find tomorrow some time to dig into it...

Hi,

Just a quick status update: I updated my clients with the latest SDK (see Edge-oidc URL order - #7 by marvkis ), and will have an eye on the issue tomorrow.

Bye,
Chris

Hi!

A status update: I had stable operation today with the 2.53 (559) OS X desktop client (except for the issue of not all identities reconnecting properly after waking up this morning. Logs have already been collected for Shawn) I didn't have any trouble with non-working connections, though.

A grep search of the logs folder for "invalid session" didn't produce any results, so it seems the issues are solved within the latest SDK. I'll keep an eye on it.

Thanks!

3 Likes