Controller Availability

Hi guys, I was reading this discourse [Controller Availability / Redundancy] because I am planning on including OpenZiti in a project. One big question that I have is if the controller availability is needed the whole time.

In that post, you mention that the controller can be up and running in a few seconds, that’s true from what I have tested. Now, the problem is that in this period seems that the network is not available.

I tried increasing the routers config params connectTimeoutMs and getSessionTimeout to a high value, but as a result, I do not get any improvement nor any extended time period during which routers can operate without the controller.

Just for adding some more info, I have a basic setup with one controller and one edge router, and the client is a Windows machine using Ziti Desktop Edge.

Hi @SalvaChiLlo, welcome to the community and to OpenZiti!

When the controller goes down, for that period while it restarts, any new connections will be affected. That's definitely true. Any already-established connections should be fine. So for example, if you were ssh'ing from one machine to another and the controller goes down you should not notice that.

First of all, thank you very much for your response.

Okay, so what I understand, is if I have a connection that is persistent, some sort of stream, it won’t be affected, but, for example, if I make a new HTTP request, this request won’t be allowed.

So the controller is the one who controls who is accessing the network at any moment. Do you plan on pushing some of this functionality to routers? This way routers could be able to validate identity certificates and also know which are the available policies so controller communication could be reduced a lot and availability could be much higher.

I also have another question which does not really have anything to do with the other, which are the reasons why you chose to use boltdb end not something like etcd which can make it possible to have a distributed store?

Yes we do indeed! It's probably behind the HA initiative. I'll be

Various reasons. The one that I remember most is for simplicity of deployment. It's one fewer thing to configure, explain etc. Maybe @plorenz or @andrew.martinez will have more details to add there. Maybe that's enough? :slight_smile:

Cool to know that routers will be able to handle all that, I am looking forward to seeing it!

About the DB topic, is it possible to integrate etcd into Ziti by some sort of configuration? This may be very helpful in case of deployments in environments like Kubernetes where already exists one and it would be just a few more configuration parameters.

As I understand it, etcd uses bolt under the hood. We’re actually using bbolt, which is the etcd maintained version of bolt, as bolt is no longer receiving updates.

It’s not likely that we’ll put the effort into making the storage engine pluggable, especially since it currently has no external dependencies (other than access to disk).

Cheers,
Paul