Controller Availability / Redundancy

Hello everyone,

I have a few questions regarding availability in an OpenZiti network to which I couldn’t find a clear answer:

  • Multiple controllers / redundancy : I’ve read there’s current work being done (HA if I’m not mistaken) in that area. Is it currently possible to have multiple controllers for redundancy ?
  • If it’s not possible, how is redundancy typically handled ?
  • When exactly is the controller needed ? For exemple, does it need to be 100% available and if the connection to the controller is lost, everything stops working ? Or is it just needed to log into the ziti network ? To update new paths ? What happens if the controller is down for a few seconds / minutes / hours ?

Thank you ! :grinning:

Not just yet, but the efforts continue to move forward. It'll be soon. Look for updates in the coming months

Data plane redundancy is handled by deploying multiple edge routers. Controller availability is all about getting the controller back running, which generally takes very few seconds. Right now we recommend backing up the controllers state and pki at whatever interval you decide is acceptable (daily, for example) and then putting a disaster recovery plan in place that restores the db/pki and starts the controller back up.

While the controller is down, persistent connections in the data plane are not affected. Most users never notice the controller restarting.

The controller is necessary when making new connections, when authenticating new connections, and when making network configuration changes. I think I already answered the other bits of this question in the last paragraph.

Cheers

1 Like

Following up @TheLumberjack, you can follow Controller HA here - Controller HA · GitHub

Great, you answered to everything, thank you !

I just found this, thanks for clarifying that the controller isn’t really necessary simply to keep up the operation. Is it really or is there a periodic check-in which would fail and stop the data plane connection as some point?
Is there a recommended way to backup the PKI and DB? Simply copying the file system? Thanks!

At ~30 minutes (by default) the api session a client establishes will need to be “kept alive”. If it’s not kept alive, the sessions tied to that api-session are no longer valid and any connections from that client would be disconnected. So you need to get it back online before that happens.

Backing up the PKI is something that you need to do. Copying the files somewhere is probably how I’d do it, yes. There is a ziti cli command to create a db backup ziti edge db snapshot. That will make a backup right next to the running db. For the quickstart, that’d be at $ZITI_HOME/db/ Here’s an example where I just ran the snapshot command:

ll $ZITI_HOME/db/
total 1208
drwxr-xr-x 2 ziti ziti    4096 Jul 17 19:54 ./
drwxr-xr-x 5 ziti ziti    4096 Jul 17 19:43 ../
-rw------- 1 ziti ziti 1048576 Jul 17 19:55 ctrl.db
-rw------- 1 ziti ziti  614400 Jul 17 19:54 ctrl.db-20230717-195450

If the db has grown substantially, you can also compact it ziti ops db compact

1 Like

Thanks! Did you ever test copying the running DB and if this will result in a consistency issue? I’m just asking because I think it’ll be rather inconvenient to snapshot the DB periodically with storing the credentials on the system…

I'm not sure what you mean? I don't believe the snapshot requires credentials to be supplied. And yes, it's potentially a problem to "just copy the database" so we definitely tell you to use the snapshot command. It pauses writes to make the backup.

Aha okay! I assumed I would need to do a ziti edge login first in order to backup. Thanks for clarifying