macOS Desktop Edge doesn't connect to another router right away

So I have been conducting emergency shutdowns to check how various edge devices would react to shutting down 2 of 2 edge routers and putting back only one of these.

And it turns out it takes quite some time, 5-10mins approximately, for macOS Desktop Edge to detect that one of those edge routers has gone back online and to try to route traffic through it.

What could be the culprit?

Worth noting that basic ziti-edge-tunnel on linux hosts connects just fine(it seems at least).

Also it works fine when routers are going down sequentially.
Like one has gone offline, another one working. And vice-versa.
But when both of them go offline and if only one of those that have gone offline gets back online - it takes that long.
ziti edge list ers clearly shows that that edge router that got back online is 'online' and supposed to accept and route connections

logs(stopped after successful return from dark edge mgmt api)

UPD:
Alright, it seems like it isn't related to the macOS Desktop Edge, but is actually related to the ziti controller not updating the ziti edge list terminators table fast enough.
It proceeds to be still listing offline router that is offline as a terminator for services ,but the other one is clearly running and is online.
And so after some time it sends 'unroute for circuit to router' to update the terminators for services.
Question is:
Is this expected to take quite some time? Is it possible to configure fast change of terminators(edge routers)?

UPD2:
Alright it seems like not even a problem with the edge routers but with the edge tunnelers. Or somewhere in between.
When I force restart ziti-edge-tunnel systemd service on an edge device it seems to re-register itself as a terminator fine with the controller and an edge router.
Question now is how to configure this cycle of re-registering to be smaller?
It just doesn't feel good to have services that use ziti-edge-tunnel being unable to communicate for around 5 minutes with the rest of the overlay when all routers die for a brief moment.

Interesting observations. Would you detail the exact steps you're taking so I/we can try to reproduce?

Is this expected to take quite some time?

I certainly wouldn't expect it to take a substantial amount of time. 10 minutes seems too long to me but I'm not exactly sure what is expected either. I would have expected it more like "a minute" maybe.

@ekoby probably will need to weigh in here to know exactly what the expectation is.

there are probably a few things that contribute to terminator re-establishment delay:

  • if ERs die without gracefully closing connections, it takes some time for keep-alive logic to kick in and for SDK/tunneler to realize they are gone
  • SDK polling interval -- delay in discovering that ERs came back online
  • ER connection backoff -- connection to ERs are re-tried with exponential backoff (see below)
  • Service binding backoff -- when no ERs are available the binding will check ER status with exponential backoff, this is done to avoid hammering newly started ERs with terminator requests. I believe that the maximum backoff time is about 3 minutes.

the log above seems to be from the intercepting side. You should be able to see the above process play out if you look at the hosting side logs.

Also, what versions of network and tunneler(s) are you testing with?

Update: there is also ER connection backoff