Binding failed ziti edge router not available

Hi, with a freshly installed Debian 12 and Ziti Edge Tunnel v0.21.6-local, suddenly the connection drops and doesn’t recover:

Jul 14 23:54:03 soar systemd[1]: Starting ziti-edge-tunnel.service - Ziti Edge Tunnel...
Jul 14 23:54:03 soar ziti-edge-tunnel.sh[3762]: NOTICE: no new JWT files in /opt/openziti/etc/identities/*.jwt
Jul 14 23:54:03 soar systemd[1]: Started ziti-edge-tunnel.service - Ziti Edge Tunnel.
Jul 15 00:23:03 soar ziti-edge-tunnel[3764]: (3764)[     1740.096]    WARN ziti-sdk:bind.c:312 on_message() binding failed: -17/ziti edge router is not available
Jul 15 00:23:03 soar ziti-edge-tunnel[3764]: (3764)[     1740.096]    WARN ziti-sdk:bind.c:312 on_message() binding failed: -17/ziti edge router is not available
Jul 15 00:23:03 soar ziti-edge-tunnel[3764]: (3764)[     1740.096]    WARN ziti-sdk:bind.c:312 on_message() binding failed: -17/ziti edge router is not available

On the router side it looks like this:

Jul 15 00:23:03 zt ziti-router[1201863]: {"file":"github.com/openziti/edge@v0.24.12/router/xgress_edge/fabric.go:74","func":"github.com/openziti/edge/router/xgress_edge.(*edgeTerminator).close","level":"info","msg":"removing terminator on controller","terminatorId":"xmWlCDjMJJaeOaDBDfUhP","time":"2023-07-15T00:23:03.708Z","token":"274787e9-4ab1-4eea-8b31-f0432bfba9f4"}

Jul 15 00:23:03 zt ziti-router[1201863]: {"file":"github.com/openziti/edge@v0.24.12/router/xgress_edge/fabric.go:74","func":"github.com/openziti/edge/router/xgress_edge.(*edgeTerminator).close","level":"info","msg":"removing terminator on controller","terminatorId":"1flUhnBnMrjtk9LWdKmuHl","time":"2023-07-15T00:23:03.732Z","token":"c15de3a7-36dd-4e89-91dc-7e5811153553"}

Jul 15 00:23:03 zt ziti-router[1201863]: {"file":"github.com/openziti/edge@v0.24.12/router/xgress_edge/fabric.go:74","func":"github.com/openziti/edge/router/xgress_edge.(*edgeTerminator).close","level":"info","msg":"removing terminator on controller","terminatorId":"12RnSwBZt61qsWIQAJlq3z","time":"2023-07-15T00:23:03.735Z","token":"df798e3e-3007-4754-a5e7-4e65394599d9"}

Jul 15 00:23:03 zt ziti-router[1201863]: {"_context":"ch{edge}-\u003eu{classic}-\u003ei{VRpQ}","file":"github.com/openziti/edge@v0.24.12/router/xgress_edge/listener.go:114","func":"github.com/openziti/edge/router/xgress_edge.(*edgeClientConn).HandleClose","level":"warning","msg":"failed to remove terminator for session on channel close","terminatorId":"xmWlCDjMJJaeOaDBDfUhP","time":"2023-07-15T00:23:03.739Z","token":"274787e9-4ab1-4eea-8b31-f0432bfba9f4"}

Jul 15 00:23:03 zt ziti-router[1201863]: {"_context":"ch{edge}-\u003eu{classic}-\u003ei{VRpQ}","file":"github.com/openziti/edge@v0.24.12/router/xgress_edge/listener.go:114","func":"github.com/openziti/edge/router/xgress_edge.(*edgeClientConn).HandleClose","level":"warning","msg":"failed to remove terminator for session on channel close","terminatorId":"1flUhnBnMrjtk9LWdKmuHl","time":"2023-07-15T00:23:03.741Z","token":"c15de3a7-36dd-4e89-91dc-7e5811153553"}

Jul 15 00:23:03 zt ziti-router[1201863]: {"_context":"ch{edge}-\u003eu{classic}-\u003ei{VRpQ}","file":"github.com/openziti/edge@v0.24.12/router/xgress_edge/listener.go:114","func":"github.com/openziti/edge/router/xgress_edge.(*edgeClientConn).HandleClose","level":"warning","msg":"failed to remove terminator for session on channel close","terminatorId":"12RnSwBZt61qsWIQAJlq3z","time":"2023-07-15T00:23:03.741Z","token":"df798e3e-3007-4754-a5e7-4e65394599d9"}

Am I doing something wrong? If so, what?

Hi @dmuensterer.

I’ve never ever seen that particular situation myself. I wonder if it’s something to do with the way the identity is configured. Are you able to make a new identity with the same attributes/ policies and replicate the issue? I’m thinking something about the way it’s configured is triggering an issue. It’d be great to replicate the problem.

1 Like

There is one „special“ case in this scenario.
The same identity was used on another server. We’ve stopped the ziti-edge-tunnel there using systemctl stop ziti-edge-tunnel and the copied the identity from /opt/openziti/etc/identities.

Is that something that shouldn’t be done in general? Is there anything binding an identity to a host except for the json file itself?

Have you done testing by any chance what happens if a single identity is deployed on different hosts?

That's definitely not something we recommend doing, but I've totally done this myself and there's no problem doing this other than the fact that you're moving/sharing a file that's "secret". This would be something I might do if a machine crashed and I was restoring it, or during development... But It absolutely should not impact functionality in any way like you describe/observed.

It might end up confusing clients if there are two different computers binding the same identity. It still wouldn't be something that should cause it to crash that way.

So to replicate this, I should be able to make an identity that binds a service, move that identity, turn the old one off and the new one on? Are you able to reproduce the problem? Are those steps I listed accurate?

FYI the 0.21.6 release was pulled. It contained an attempted fix for router reconnections that was found to be faulty. I don’t know (or even suspect) that the issues with 0.21.6 are relevant to this situation, but in general I’d recommend running 0.21.5 for now.

2 Likes

I didn’t yet reproduce the problem but that’s what I did and results in the error eventually.
I will try backupping the old identity, creating a new identity, assigning it to the service and see if the problem lies with the identity!

I’ll try later with a new identity. So far I can tell that the fail occurs pretty accurately 30 minutes after restarting the service. This happens all the time (tried 4 times so far).
My next step will be installing a new identity, thereafter downgrading to 0.21.5.

Let’s see :slight_smile:

That was it! After creating a new identity, the issue persisted.
After downgrading to 0.21.5 the issue didn’t occur anymore yet.
I simply did a apt install ziti-edge-tunnel=0.21.5. Looks like my installation yesterday was just between you pulling the faulty version from the repo :slight_smile: Thanks for the help