Backup / Restore

What is the requirement / process around backup. Using docker container, but any documentation about locally hosted or docker would be great. I could not see it, but are there commands that backs up the configuration / restores configuration?

Being docker, it is probably the docker-compose along with a DB or sorts that would be needed right? Quick google didn’t turn anything up.

There is a ziti cli command to backup the database. ziti edge db snapshot

Database management operations for the Ziti Edge Controller

Usage:
  ziti edge db [flags]
  ziti edge db [command]

Available Commands:
  check-integrity-status shows current results from background operation checking integrity of database references and constraints
  snapshot               creates a database snapshot
  start-check-integrity  starts background operation checking integrity of database references and constraints

It will put a copy of the database next to the online db which you can then backup by simple docker copy/paste. You’d probably would want to grab the whole PKI too.

Sorry to necropost, what do you mean by this? Is there a doc that includes what directories need to be backed up, regardless of the install method?

You’d probably would want to grab the whole PKI too.

You can look in the identity section of your controller and router YAML files to see the location.

That’s true. The identity configuration section of the controller’s and routers’ config YAMLs will cover most PKI-related files you need to back up. The identity section may appear in several places in those configs, depending on how many certificates you use.

In the controller’s config, there’s also the edge enrollment signer. This is the certificate and private key of the CA that issues leaf certificates during edge enrollment of routers and identities. These are configured in the edge.enrollment.signingCert property (link to reference doc).

Is there a common root path I can backup? We are talking about text files, so I don’t mind backing up MORE than I need for the sake of making backup simpler.

If you are starting from one of the quickstarts, all of the PKI will be in a directory called pki (though it may be different for the Kubernetes quickstart - I’ll defer to @qrkourier). A check of the YAML files is your best place to check and verify.

On my local machine, they are all stored in ~/.ziti/quickstarts/${NET_NAME}/pki (also available as env variable ZITI_PKI if running the express install per the docs).

Yes, sound off if you need specific K8s ops guidance. It’s not solved in any one size way yet. The same two resources are applicable to K8s: controller database (a file in a PVC) and PKI (owned by cert-manager).

Ok, will do. I am probably not going to run it in k8s any time soon. So it will be either locally installed or in compose.

Thanks.

Hi @qrkourier

I'm right now on creating a migration script in nodeJS for migration all OpenZiti data within K8S - in other words its' DB and the certs - to another cluster.

-> And yes I still have the AWS PCA integration in mind.. just no time yet, as I'm dealing with AWS general architecture :sweat_smile:

My plan is to use the Edge Management API and call POST /database/snapshot. Then I would fetch the snapshot using a kubectl library for nodeJS and also get all the related certs from cert-manager.

But the question is how to restore that DB file to the "new" OpenZiti deployment, when deploying OpenZiti via helm.

Also I would like to know if it this a good idea to do migration like that or better do it via exporting identities one by one (via script) - as you could have changed the db keys ?
(But then I would loose client and router enrollments I guess)

BR
Jan

You'll need the controller's BoltDB file and at least the Ziti Edge Signer CA certs from cert-manager. Your existing identities and routers are configured to trust the web API's server certificates too, so it's probably necessary to transplant the web identity PKI as well.

EDIT: You'll also need the controler plane PKI if you are transplanting routers because they're configured to trust that server certificate.

EDIT2: The more I think about it, I can't think of any PKI you wouldn't need in at least some situations, so it's probably simpler to transplant all the resources managed by cert-manager on behalf of the Ziti controller. No need to transplant trust-manager's Bundle because it will be re-created by the new trust-manager.

As for the BoltDB file, the controller's Helm chart creates the file if it doesn't exist, so you must create the persistent volume in the new cluster, place the copied DB file in the expected filename ctrl.db, and then install the controller chart in the new cluster with value existingClaim having the name of the pre-created PVC.

Thanks for the fast response.

So basically dumping the database & PKI is the only way as I see ?

In case we also want to update the version of the controller:
-> How can we avoid issues with changes of BoltDB key-names ?
-> Could there be a database migration necessary when updating ?

BR

Yes, you will need to copy the DB file ctrl.db from the old controller's volume to the new. If you wish to obtain a copy while the controller is still running you may use the ziti agent controller snapshot-db command, which is available inside the controller container. This will produce a snapshot in the same directory as the BoltDB file.

You'll also need to recreate the PKI that are managed by cert-manager and Helm. I don't have any ready tooling for this task, but it can certainly be done with the Kube API and kubectl. One idea that came to mind is to perform a helm install on the new cluster, especially if you're planning to have different namespaces or labels, then patch the cert-manager resources with the data from the old cluster.

As long as you're upgrading and not downgrading the controller version, or begin using a new controller configuration that triggers a DB migration, I expect any database schema changes to be handled gracefully and automatically.

You will need a separate copy of the BoltDB file if you wish to roll back a controller version or controller configuration change that triggers a database migration.

Thanks for the explanation!
I managed to perform a manual migration from one K8s cluster to another - basically changing the infrastructure environment stage.

This is the process I came up with:

Migration workflow

This workflow only works as long as the naming of all resources - certs, helm release names and namespaces - are equal to the "old" installation !

Backup

  • In old controller Pod run:

    ziti agent controller snapshot-db

  • Copy db snapshot form pod to local:

    kubectl cp openziti/openziti-base-controller-c46c64b69-vrm4p:/persistent/ctrl.db-20231017-142051 ctrl.db-20231017-142051

  • Backup openziti-base-controller-admin-secret:

    kubectl get secret/openziti-base-controller-admin-secret -o yaml > openziti-base-controller-admin-secret.yaml

  • Backup pvc config openziti-base-controller:

    kubectl get pvc/openziti-base-controller -o yaml > openziti-base-controller.yaml

  • Backup certs and issuers:

    kubectl get -n openziti -o yaml issuer,cert > backup.yaml

  • Backup secrets for certs and issuers:

    • openziti-base-controller-admin-client-secret
    • openziti-base-controller-ctrl-plane-identity-secret
    • openziti-base-controller-ctrl-plane-intermediate-secret
    • openziti-base-controller-ctrl-plane-root-secret
    • openziti-base-controller-edge-root-secret
    • openziti-base-controller-edge-signer-secret
    • openziti-base-controller-web-identity-secret
    • openziti-base-controller-web-intermediate-secret
    • openziti-base-controller-web-root-secret
  • Backup secrets for router enrollments and respective identities

    • openziti-routers-router1-identity
    • openziti-routers-router1-jwt
    • openziti-routers-router2-identity
    • openziti-routers-router2-jwt

    Be sure to delete annotations, labels, and creationTimeStamp, ownerReferences, resourceVersion, uid from secrets and certs/issuers before running kubectl apply on the new K8s cluster!

Restore

Controller

  1. Deploy openziti-base-controller-admin-secret.yaml to new cluster

    • kubectl apply -f openziti-base-controller-admin-secret.yaml
  2. Deploy pvc to new cluster, mount it to busybox and transfer db backup to it!

    • Run busybox pod to be able to copy db file to PV
    kind: Pod
    apiVersion: v1
    metadata:
    name: volume-debugger
    spec:
    volumes:
        - name: volume-to-debug
        persistentVolumeClaim:
        claimName: openziti-base-controller
    containers:
        - name: debugger
        image: busybox
        command: ['sleep', '3600']
        volumeMounts:
            - mountPath: "/persistent"
            name: volume-to-debug
    
    • kubectl apply -f busybox-pvc-controller.yaml
    • kubectl cp ctrl.db-20231017-142051 openziti/volume-debugger:/persistent/ctrl.db
    • Check if the db file is there

      kubectl exec -it pod/volume-debugger -- /bin/sh
    • Delete the busybox-pvc debugger

      kubectl delete -f busybox-pvc-controller.yaml
  3. Add existing pvc claim to openziti controller chart!

  4. Run openziti-base helm chart

  5. Override certs, issuers and secret resources with backups

    • kubectl apply -f certs_issuers/
    • kubectl apply -f controller-secrets/
  6. Should the domain-names for the controller APIs have changed, some certs need to be renewed in the cert-manager:

    • ´´´cmctl renew openziti-base-controller-web-identity-cert --namespace=openziti´´´
    • ´´´cmctl renew openziti-base-controller-ctrl-plane-identity --namespace=openziti´´´
  7. Restart OpenZiti controller - delete and it will be recreated

Router(s)

  1. Deploy routers via helm
    • (Delete old Router identities in ZAC)
    • Create new identities for Router1 & Router2
    • Add the JWT enrollment token to the respective .values.yaml file!
    • Reusing the router identity(ies) is not intended by the router helm chart!

This is for sure not complete, but it is a starting point :slight_smile:

3 Likes

Hi there,

I wanted to ask, whether it is already possible to hold the data - which is currently in a bbolt db inside the ziti controller pod - outside of a kubernetes deployment. So basically run the database separately, like this is done with most of the apps that are deployed to K8S.

If it is not possible, are you planning to provide that ?

BR
Jan

Hi @janst,

Ziti's data layer is always a BoltDB file that's writable by the controller. There's currently no plan to support separate data sources.

Can you share any specifics about the positive effects that a separate data source would have for your case, or the undesirable effects of a BoltDB file?

The restore procedure outline you proposed appears sane. I agree that creating new routers makes more sense than restoring their identities because there's no unique data stored with the routers. They're always worker nodes that respect the controller's declared state. This requires that your policies are written to grant role attributes like #my-green-routers, not specific router entity names like @greenRouter11, so that newly created routers with the same role attributes will immediately begin doing the same work as the routers they replaced.

I'm curious about what alternatives to the manual procedure you may have evaluated. Have you considered a cluster-wise backup solution like Velero/CloudCasa? I played with Velero OSS and had some success. There's a bit of a learning curve, but it's powerful.

Basically we have internally agreed on not having any form of databases inside K8s, as we want to use managed database services in our cloud environment. So we don't need to perform backups on our own and can restore these backups completely independent from K8s.

For OpenZiti running in production within K8s we probably need to use some kind of K8s job to backup the bbolt data to S3 storage. Also unclear for us is whether we need to do sth. before we can actually create a backup of the bbolt DB by running ziti agent controller snapshot-db. Can this command be executed during active use of the openziti overlay network or is it necessary to shut sth. down beforehand ?

That makes sense. I take it that the tradeoffs inherent to delegating the database life cycle management to a provider are appealing in certain situations. Thank you for sharing that insight.

While searching the docs for database backup clues, I found there are two similar commands; both trigger the controller's in-built procedure to create a snapshot while preserving the integrity of the database.

  1. ziti edge db snapshot: call the controller's snapshot operation in the management REST API
  2. ziti agent controller snapshot-db: call the snapshot operation with IPC (run on the controller's host)

I documented the recommended approach in this new guide.

2 Likes

@janst @qrkourier , won't the routers and all other identities work like nothing has happend after backup ?