Router fails to fire re-establishing hosted services after most controller raft leader changes, causing chronic terminator state drift

raft leader changes do not consistently trigger the router's HostedServiceRegistry.HandleReestablish path. The router's hosted-terminator view silently drifts from the controller's view until a user-facing failure surfaces.

router: taking over terminator from existing bind (PR #3615 mechanism active, but insufficient)

controller raft events on 2026-05-10, -11, -12 — router never re-establishes

06:56:01 controller starts responding "has no online terminators" for an SMB tunnel service

        router shows no error, still considers itself the host

06:59:46 operator removes+adds router to bind policy → new terminator established

07:00:16 router: "xgress circuit not started in time, closing" (data plane still broken)

07:00:46 docker restart compose-ziti-router-1

07:00:49 "Hosting newly available service" → working

Router had no control-channel disconnect logged between 2026-05-10T01:51 (last successful re-establish) and the 07:00:46 manual restart, despite three raft events in that window.

Has anyone experienced the following before

Hi @msbusk ,

Some questions and notes:

  1. What version are you running?
  2. Are these SDK terminators or ER/T terminators?
  3. What kind of raft events? Controllers restarting? Just leadership changes? If so, what triggered the leadership change?

The "has no online terminators" doesn't indicate a terminator mismatch. It indicates that the controller knows about the terminator, but the router in question isn't connected to that controller. I would start there. Did the router think it was connected to the controller and thus didn't redial? Was it disconnected and trying to connect? Did the endpoint list get of sync and the router didn't know about the controller?

"xgress circuit no started in time" also doesn't indicate a problem with the terminator. It means the terminating router dialed the service and then waiting for the initiating router to signal that circuit setup was complete, but that signal didn't arrive in time (or at all). I would check the circuits and see if the circuit setup was successful or if other parts of the route weren't able to be setup correctly.

Cheers,
Paul

We haven’t experienced it again, and unfortunately I no longer have the log files. We’ll keep an eye on it and make sure to upload some logs if it happens again.