What is the easiest way to create an HA setup with 3 controllers

Hi everybody,

I've been using OpenZiti for a few days and it's fantastic! I intend to use it to give my coworkers access to a private endpoint and it felt like overkill to go with a paid solution for such a simple use case.
My initial idea was to use a single controller with 3 routers (one for each VPC), but for compliance reasons (and to ensure my network doesn't go down in case I need to restart the controller - which would be a bummer), I'll need an HA setup for the controllers.
It looks like the HA part of the docs is not complete, or at least I couldn't find my way around it. Is there some sort of quickstart, or a basic script + steps to spin up a 3-controller OpenZiti network? That would be a huge help for me.

Thanks for making OpenZiti available!
Pedro

Hi @Pehesi97, welcome to the community and to OpenZiti!

Thanks! We're pleased you're having a good experience! Thanks for sharing.

Clustered Controllers are still being worked on and are in a 'beta' state. Things are expected to work, but we are still testing and trying to find all the corner cases before calling it complete/stable. We are working on docs all the time but in general, the docs will often come after all the kinks are worked through. There are a couple of setup guides that could help you out. Notably is the markdown in the project itself. You can find that here: ziti/doc/ha at main ยท openziti/ziti ยท GitHub

Let us know if you have any troubles getting that going, but it should probably give you much/most/all of what you need.

Hoping for a clarification: Did you find the HA docs here: Controller Clustering | OpenZiti and found them lacking, or didn't find them?

If the first, are you looking for something less reference oriented than the Bootstrapping section?

Thank you,
Paul

1 Like

Yes I found those docs and even the GitHub dev-setup.md file. That doc shows how to run a single-node cluster, but not an actual multi-node cluster. The quickstart.md file is outdated.
My expectation was to find a multi-node example. But if I'm able to get this working I can help provide it in the form of a PR if you want.

Hi @Pehesi97,

Mmm. I might see the confusion... The examples on the doc pages Controller Certificates | OpenZiti demonstrates how to generate the PKI for a cluster, but doesn't go into generating the config file, just assuming you have that part all under control... The quickstart.md definitely shows you how, but it uses the quickstart to accomplish everything which is 'fine' but not what you want exactly. if it's outdated then we should fix it. I scanned it (did not try it) and it seemed ok to me. ziti/doc/ha/quickstart.md at main ยท openziti/ziti ยท GitHub

The dev setup is probably what you want i think. that does seem to be pretty complete but targets localhost only ziti/doc/ha/dev-setup.md at main ยท openziti/ziti ยท GitHub

We are actively working on these sorts of docs actively to help guide people through the process. If you want I'm sure I can show you a few commands to setup a cluster easily enough. I've been meaning to try it myself anyway. I'll work on a small set of commands to see if i can get you moving...

It's good that you let me know I shouldn't be using the quickstart commands. I was about to try them because I'm having a hard time with the dev setup on a multi-node setup. The thing with the quickstart doc is that the params are outdated.. they're much simpler now, it seems.

And that's awesome! Thanks for the help. I'll keep trying from here but if you beat me to it that would be even better haha

Thanks again.

Well no, you CAN use them. That's actually the examples I was cooking up. :slight_smile: I just want to get you moving along and so you can inspect the config files that are generated, the pki etc. I'll get some commands and shoot them out here and we can go from there. :slight_smile: Gimme a few and we'll see where it lands. My goal is to get you going...

Ok Here's what I did. The most complex part of clustered controllers is understanding the PKI needs to be generated from the same root CA. To do that you either need to generate the PKI on the same machine and transfer intermediates later on, or as in this example, you need to transfer the root cert and root key to each node. (after bootstrapping, you should remove the root key and offline it and keep it very safe of course).

I tried to parameterize it to keep it easy on you as well. I have a wildcard domain setup to *.zrok.clint.demo.openziti.org too but I think this is hopefully enough to get you going... Have a look and see if you have any questoins.

Initial Controller - ctrl1

export TRUST_DOMAIN="zrok.clint.demo.openziti.org"
export ZITI_PWD="replace.this"
export ZITI_INST="ctrl1"
export ZITI_CTRL_PORT="6400"
export ZITI_ROUTER_PORT="6401"
export ZITI_INITIAL_CTRL="tls:ctrl1.${TRUST_DOMAIN}:${ZITI_CTRL_PORT}"
sudo chown ziti:ziti /sharedfs/
ziti edge quickstart ha \
    --instance-id="ctrl1" \
    --ctrl-port="${ZITI_CTRL_PORT}" \
    --router-port="${ZITI_ROUTER_PORT}" \
    --home="/sharedfs/ziti" \
    --ctrl-address="${ZITI_INST}.${TRUST_DOMAIN}" \
    --router-address="${ZITI_INST}.${TRUST_DOMAIN}" \
    --trust-domain="${TRUST_DOMAIN}" \
    --password $ZITI_PWD

The Other Two Controllers

Notice here the command differs slightly from before. Also notice that i used "cp" to transfer the root key/cert/index.txt file. That's so the quickstart command can generate the necessary PKI for you when it runs...

Ctrl2

export TRUST_DOMAIN="zrok.clint.demo.openziti.org"
export ZITI_PWD="replace.this"
export ZITI_INST="ctrl2"
export ZITI_INITIAL_CTRL="tls:ctrl1.${TRUST_DOMAIN}:6400"
export ZITI_CTRL_PORT="6500"
export ZITI_ROUTER_PORT="6501"

mkdir -p "/tmp/${ZITI_INST}/pki/root-ca/keys"
mkdir -p "/tmp/${ZITI_INST}/pki/root-ca/certs"
cp /sharedfs/ziti/pki/root-ca/keys/root-ca.key /tmp/${ZITI_INST}/pki/root-ca/keys/
cp /sharedfs/ziti/pki/root-ca/certs/root-ca.cert /tmp/${ZITI_INST}/pki/root-ca/certs/
cp /sharedfs/ziti/pki/root-ca/index.txt /tmp/${ZITI_INST}/pki/root-ca/index.txt

ziti edge quickstart join \
    --instance-id "${ZITI_INST}" \
    --ctrl-port "${ZITI_CTRL_PORT}" \
    --router-port "${ZITI_ROUTER_PORT}" \
    --home "/tmp/${ZITI_INST}" \
    --ctrl-address="${ZITI_INST}.${TRUST_DOMAIN}" \
    --router-address="${ZITI_INST}.${TRUST_DOMAIN}" \
    --trust-domain="${TRUST_DOMAIN}" \
    --cluster-member "${ZITI_INITIAL_CTRL}" \
    --password $ZITI_PWD

Ctrl3

export TRUST_DOMAIN="zrok.clint.demo.openziti.org"
export ZITI_PWD="replace.this"
export ZITI_INST="ctrl3"
export ZITI_INITIAL_CTRL="tls:ctrl1.${TRUST_DOMAIN}:6400"
export ZITI_CTRL_PORT="6600"
export ZITI_ROUTER_PORT="6601"

mkdir -p "/tmp/${ZITI_INST}/pki/root-ca/keys"
mkdir -p "/tmp/${ZITI_INST}/pki/root-ca/certs"
cp /sharedfs/ziti/pki/root-ca/keys/root-ca.key /tmp/${ZITI_INST}/pki/root-ca/keys/
cp /sharedfs/ziti/pki/root-ca/certs/root-ca.cert /tmp/${ZITI_INST}/pki/root-ca/certs/
cp /sharedfs/ziti/pki/root-ca/index.txt /tmp/${ZITI_INST}/pki/root-ca/index.txt

ziti edge quickstart join \
    --instance-id "${ZITI_INST}" \
    --ctrl-port "${ZITI_CTRL_PORT}" \
    --router-port "${ZITI_ROUTER_PORT}" \
    --home "/tmp/${ZITI_INST}" \
    --ctrl-address="${ZITI_INST}.${TRUST_DOMAIN}" \
    --router-address="${ZITI_INST}.${TRUST_DOMAIN}" \
    --trust-domain="${TRUST_DOMAIN}" \
    --cluster-member "${ZITI_INITIAL_CTRL}" \
    --password $ZITI_PWD

After Running These Commands Overview

These commands will end up making three controllers and three rotuers using the current quickstart command. (future readers, the ha subcommand should be going away, just run quickstart without ha)

If you shut down a controller, the config files and pki will all remain for you to inspect if you wish. Also you can choose to run the controller and router separately at that point too. For example you could run something like ziti controller run /tmp/ctrl3/ctrl3/ctrl.yaml to run the third controller without a router as I did below (rotuer 3 is offline):

ziti fabric list routers
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ ID         โ”‚ NAME         โ”‚ ONLINE โ”‚ COST โ”‚ NO TRAVERSAL โ”‚ DISABLED โ”‚ VERSION               โ”‚ LISTENERS                                      โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ UkAcDuu-WG โ”‚ router-ctrl2 โ”‚ true   โ”‚    0 โ”‚ false        โ”‚ false    โ”‚ v1.4.3 on linux/amd64 โ”‚ 1: tls:ctrl2.zrok.clint.demo.openziti.org:6501 โ”‚
โ”‚ cB-jbKK3IS โ”‚ router-ctrl3 โ”‚ false  โ”‚    0 โ”‚ false        โ”‚ false    โ”‚                       โ”‚                                                โ”‚
โ”‚ rjla7eevYs โ”‚ router-ctrl1 โ”‚ true   โ”‚    0 โ”‚ false        โ”‚ false    โ”‚ v1.4.3 on linux/amd64 โ”‚ 1: tls:ctrl1.zrok.clint.demo.openziti.org:6401 โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
results: 1-3 of 

Ok, I'll stop here and let you digest this. I hope it helps and isn't overwhelming or not what you're looking for! :slight_smile:

1 Like

Awesome! Thanks @TheLumberjack! It looks like I got a lot of things wrong.

I'm using Terraform to start the cluster and get it going. The part of generating the PKI wouldn't be a problem for me. I'd use Terraform to generate the CA and the keys / certs, push those files to a secure location, then on the launch template of each machine I'd retrieve all of the files generated by create-pki.sh.

A few points I may be wrong about, and it would be very helpful if you could clarify for me are:

  • should the trust domain be an existing, valid domain?
  • is it mandatory for the URLs (i.e. ctrl1.) to be pointed to the controllers? like if I want to create a ctrl1.internal.company.com controller, should that URL resolve to the controller IP?
  • when I run ziti edge quickstart ha with those flags, I get "unknown flag: --instance-id"; do I have the wrong ziti CLI?
  • in case my initial controller is removed for any reason (like upgrading an AMI or whatever), could I make a new controller join the cluster using the --cluster-member param pointing to one of the other two?
  • the advertise address in the controller yaml: I'm assuming it needs to be a publicly reachable URL; does it need to be in the trust domain or can it be something else?

I know this is a lot of questions, so if you don't have the time to look at them right now it's fine. I'll play with the commands you sent me and try to adapt it to my situation.
Thank you very much

PREFACE: a lot of this is still new to me too so I reserve the right to be incorrect! :slight_smile:

I don't believe so. This goes into the spiffe id and doesn't really matter. I chose to reuse the same domain as the overall solution just because it makes sense. But my understanding is that it is not tied to anything other than "this is the trust domain". The quickstart used to use "quickstart" for example. small definition here.

"which" urls exactly? The --ctrl-address must be the controller's address but maybe that's obvious. The --cluster-member is only used when joining the cluster using the quickstart command. You could use a ziti agent command to do this if you wish as well if you like. I'm not sure how to answer this.

I ran with ziti CLI version 1.4.3 - make sure you are using a 1.4+ ziti CLI. Or put another way, what version did you run?

I don't know what the long-term ramifications are of a cluster member going away. Once added to the cluster any --cluster-member can be specified, yes. The quickstarts 'chain' them when you run them locally so that you can run three instances locally without any problem. Example:

Presumably you want it public. It doesn't NEED to be 'public' but I usually say it should be reachable by any identity looking to use it. So that probably means "public". The trust domain isn't relevant to the advertised address.

Awesome, that was really helpful. My ziti version was 1.1.7.. no idea why.. I downloaded it using the script in the Ziti website. I'll digest everything and try again. Will let you know of the results.
Thank you very much!

There are a few things to note about managing cluster membership.

  1. Just taking a node down will not remove it from the cluster. You will need to explicitly mark it as removed, otherwise the cluster can't distinguish a removed node, and a node that is just restarting/temporarily down.
  2. You need to have a cluster leader, implying a cluster quorum in order to add or remove members from the cluster. So you can go from a one node cluster to a two node, but once you're at two nodes, they'll both need to be up to make changes. This is one reason to have at least three nodes in the cluster, as it's too easy to get a two node cluster into a state where you can't easily replace a bad node.
  3. As long as the DNS or IP remains the same, you can bring up a new node in place of an old one without having to change the membership.
  4. If your cluster does get in a bad state, you can recover it by grabbing a DB snapshot and rebuilding the cluster from scratch. So it's doable, but to be avoided.

Paul

1 Like

Thanks for the info, Paul. That clarifies a lot for me.
One note: for some reason, when using the install.bash script in RPM-based machines, it resolves to ziti 1.1.7. That's why I had that version in my machines.

I think that'd be fine, but without doing it myself, I don't know for certain. At the end of the day, the quickstart just:

  • creates a controller config file
  • creates a pki for controller
  • creates a router config file
  • initialzes the controller
  • creates a router in the controller
  • enrolls the router
  • starts the controller and router in one process

If you replace the PKI, an assuming you do it correctly (it's real easy to get wrong, and then it can be hard to figure out exactly what/where you went wrong) I would think it would work.

You'll also have to ensure your config file aligns with the PKI, the advertised addresses need to be SANS in the pki.

1 Like

@TheLumberjack @plorenz any chance new versions of the Ziti CLI / Controller / Console will be published to the RPM repo soon?

@qrkourier would you have a peek at the RPM here, pls? thx

I found the missing packages! I'll rush a fix, hoping it's in time for v1.5.5. Meanwhile, you may fetch the x86_64/amd64 RPM for v1.5.4, which was never promoted, despite being released as the latest stable version in GitHub.

As a workaround, you may wish to fetch the RPM for your machine's architecture directly from the testing repo, which is still waiting to be promoted. This is a link to the x86_64 package: https://packages.openziti.org/zitipax-openziti-rpm-test/redhat/x86_64/openziti-1.5.4-1.x86_64.rpm

1 Like

Thanks!

I'll fetch the RPM from the link you provided. The good thing about the script, though, is that it creates the systemctl services as well. For testing, downloading the right release and creating the system service will do, but when I move to IaC then it'll be better to have that handled by the bash script.

I've been banging my head against this setup, and, the way the systemd services are configured, it's really hard to use a custom config.yml for the controller because of the file permissions. Not to mention the separate filesystem and stuff. I think I'll end up using the binaries and writing a systemd service unit by myself... Unless there's something I'm missing.

Moved away from rpm (Amazon Linux). The auto_enroll script doesn't work there.

You may wish to customize the systemd unit to stop using the DynamicUser feature if you're willing to maintain a run-as user for the service and continually set filemode and owner for any files in the service's working directory (/var/lib/ziti-controller).

sudo systemctl edit ziti-controller.service

Your changes will effect a systemd unit defintion override and are fully compatible with the upstream Linux package.

For example, you could create the ziti-controller user then set User when you run the edit command:

### Editing /etc/systemd/system/ziti-controller.service.d/override.conf
### Anything between here and the comment below will become the new contents of the file

[Service]

DynamicUser=
DynamicUser=no
User=ziti-controller

### Lines below this comment will be discarded

Confirm the aggregated unit definition.

systemctl cat ziti-controller.service

This procedure automates editing the unit's override.conf and reloading the definition.