Smart routing and link failure

Hello, I’m working on setting up a POC using ziti cloud and had a few questions:
If a tcp connection is created between two services using the overlay, what happens if that link is severed? Can I configure how quickly smart routing will switch over to a new path?
I’m also interested in link quality, can I configure when smart routing should switch based on various things like packet loss?
I’d like to test this scenario using iperf to calculate some throughput on link switching like this, is it as simple as using iptables to block specific ziti routers on the host to force a route switch or is there a better testing method?
Are there any docs or resources related to the above? Happy to experiment myself, thanks!

Hi @real-danm, welcome to the community and to OpenZiti!

if the link is severed from "the very first" (initiator, or the router where the underlay traffic enters the overlay) or "very last" (terminator, or final router before exiting the overlay) then the tcp connection ends up getting fully closed. If you have greater than one router on your mesh, and the link between routers has been closed, the fabric will 'smartly route' your payload from the router that received the traffic to the final router (the one connected to the actual target).

Yes, it's configurable. By default, it's once a minute. We scan all circuits, sort them by cost and see if we can optimize the top N (by percent).

At this time smart routing is still in its nascent form and operates by some static costing and dynamic costing based on latencies measured when sending data. It's meant to be extensible, it's on the roadmap for more options around smart routing but we won't see that until we have HA controllers (which is expected later this year sometime).

This is the sort of testing we'll be happy to help you with, particularly if you feel comfortable publishing the results here for the community at large, or more formally on some blog/media post, particularly if your tests are documented so that someone else could try to replicate. If you're interested in contributing back like that, we're here for it!

I think that'd do just fine, yes. You have to watch out that you block incoming AND outgoing though. Routers are setup with link listeners, and dialers by default. So if you stand two of them up, and block inbound on one router but left inbound on the other, chances are really good that they'll still form a link! I'd say there's no 'better' test than that. You could try kill -9 as well in there for some additional chaos? :slight_smile:

Not too much doc around "how to test", no, but you can read up services here for a bit more info Ziti Services | OpenZiti

I'll tag @plorenz on this, since he's working on 'mesh stuff' like this most, in case he has anything he wants to add on top, but maybe I covered it to his satisfaction! :slight_smile:

Hi @real-danm ,
Wanted to post a follow up expanding on some of what @TheLumberjack posted.

As soon as a link is detected to be down, link faults will be sent to the controller from both sides of the link, though often a link fault is caused by a down or unreachable router, in which case it’ll only be reported from the other side.
When the controller receives a link fault it will immediately attempt to reroute all circuits on that link. If both routers are still connected to the controller, it will likely attempt to re-establish the link, although that may only happen when the timer runs.

As mentioned, latency is the primary metric we use to determine when to reroute. It’s likely that we’ll incorporate things like packet loss as part of developing UDP/DTLS based links.

Regarding testing, I’d look at tools like ss (to kill links) and tc (to simulate poor connectivity). We also have some debug tools in the router, enable with the --debug-ops flag. Specifically you can use ziti agent router disconnect/reconnect to disconnect a router from the controller. If there are other testing tools you’d like to see I can show you how to add them, it’s relatively straightforward.

Cheers,
Paul