Certificate life cycle

I want to accelerate every Ziti identity’s life cycle (i.e., identity, router, controller) to ensure everything works smoothly. Identity and router enrollments seem to default to 365d. Which controller options should I set to bring the pain forward on potential certificate life cycle issues?

I know clients and routers can extend their identity expiry by requesting a new cert, and I’ve already set up auto-renewal for the controller’s server certs.

Hi @qrkourier,
I’m currently investigating into the helm charts you provided and they already ease the process of installing openziti into k8s a lot ! Thanks for that!

I just wondered how all the certs that you manage with cert-manager are then actually integrated into the running pods as soon as they have been renewed by the cert-manager. Did you already test this ?

A few more general questions:

As a K8S admin, what do I need to do to keep openziti running in my cluster over a period of several years (assuming this is enough to expire the certs).

  • Are all certs lifetimes extended automatically and if yes how ? ( except the client certs - these need to be renewed like this ?)
  • Can I automatically backup the internal bbolt DB ? Or is there a way to run it externally besides the cluster ?
  • How I can update openzitis controller & routers without downtime ?
    Can I run routers with different versions at the same time with the same controller, so I could do a rollout of new router versions without shutting all of them down and replacing them - so actually taking down the openziti network ?

Most likely I’m missing out on a lot of points here, so just correct me and/or add them please :slight_smile:

BR & thank you
Jan

Hi there @janst! You and I are thinking about the same stuff regarding running Ziti in prod with K8s. The encompassing question is "How do I run Ziti in production?" and that's going to be a collaboration over time that hopefully produces some helpful hints in the "deployment" area of the docs and the controller chart's README.

I'll answer the easy one first and focus on your use case for running Ziti in prod on K8s.

Short answer: make sure the controller pod is deployed more often than the configured server certificate life span expires

Explanation: The configmap is immediately updated when cert manager auto-renews the controller's server certificate. With the current controller chart v0.4.1, each server identity is mounted inside the running container as a separate directory. Each identity directory has files representing the configmap's data, e.g., tls.crt, tls.key.

The configmap update with the renewed certificate is propagated to the controller container filesystem immediately, so it will be read the next time the process is started.

I'll let the post immediately above serve to answer the server certificate part of your question, and focus here on the client certificate part of your question.

The Ziti controller's API allows edge clients to renew the client cert they use for edge authentication. It's up to that edge client to save the new certificate and use it in the future.

A Ziti router running in K8s is unable to save the new certificate because the certificate lives in a K8s secret. It is necessary to orchestrate helm install ziti-router with a new enrollment more often than the router's client certificate expires.

My hunch is that treating routers like replaceable cattle is the best approach. I've raised this GitHub issue for tracking and focused discussion about router client cert renewal in K8s: handle router client cert renewal · Issue #115 · openziti/helm-charts · GitHub.

The controller's Helm chart doesn't yet provide any maintenance jobs, and the CLI hints you're after are ziti agent controller --help and ziti ops db --help. :thinking: I can imagine a K8s Job that periodically saves a DB snapshot to a mounted volume. Will you let me know what you come up with or raise an enhancement request in GitHub?

The remainder of your question is whether it's possible to scale BoltDB separately from the Ziti controller. No, not presently.

The reason is that the controller uses a file handle to access the BoltDB on volume that's mounted in the controller's container, not an API over the network, if such a thing exists. I found this hint that an HTTP interface might be feasible.

Your two questions are:

  1. How do I orchestrate controller and router deployments to maintain the availability of the Ziti network?

    The network will gracefully tolerate restarting the controller. Existing sessions will function during the restart, and edge clients that need sessions should be configured to keep trying when the controller is unavailable due to a restart.

    Each router has a unique identity. Routers should be redundant so that the loss of any one router does not impair the network. You may deploy pairs of routers for each functional purpose, e.g., routers that fulfill edge dialing requests for a particular set of router role attributes. You may stagger the life cycle so that router replacements do not occur in the same time frame.

  2. Do I need to run the same version of controller and router?

    It's not essential to run precisely the same version of Ziti controller and router, but it is important to frequently upgrade the controller and routers to a recent release. This ensures you receive features and fixes.

Hi @qrkourier,

thanks for your thorough answer(s)!
I would absolutely like to collaborate on these topics and I will respond to the ones you brought up next week. :slight_smile:

Currently I’m working on visualizing how the cert-management works in combination with the cert- & trust-manager to setup all the necessary private and public keys aka. certs.

I tried to draw sth. and could not completely understand how the certs that are coming from cert-manager are related to these configurations in the openziti controller configmap:

    identity:
      #ClientCertKeyReuseIssue
      cert:                 /etc/ziti/ctrl-plane-identity/tls.crt
      server_cert:          /etc/ziti/ctrl-plane-identity/tls.crt
      key:                  /etc/ziti/ctrl-plane-identity/tls.key
      ca:                   ${ZITI_CTRL_PLANE_CA}/ctrl-plane-cas.crt
------------------------------------------------------------------------------
    web:
      - name: client
        ...
        identity:
          #ClientCertKeyReuseIssue
          cert:        /etc/ziti/web-identity/tls.crt
          server_cert: /etc/ziti/web-identity/tls.crt
          key:         /etc/ziti/web-identity/tls.key
          ca:          /etc/ziti/web-identity/ca.crt
        ...
        - name: management
         identity:
          #ClientCertKeyReuseIssue
          cert:        /etc/ziti/web-identity/tls.crt
          server_cert: /etc/ziti/web-identity/tls.crt
          key:         /etc/ziti/web-identity/tls.key
          ca:          /etc/ziti/web-identity/ca.crt
------------------------------------------------------------------------------

Probably you can clarify that ?

This is what I have come up with:

I think it would be great for everyone’s understanding to provide a graphic like that. But do not get me wrong - this drawing is, as it is now, far away from being finished :wink:

I tried to visualize the PKI for the controller which has a separate chain of certs for the controller identity and the web APIs - edgeSignerPKI is integrated into this drawing yet…

Back to understanding how the configmap values cert, server_cert, key and ca (for controller identity, web client and web management) are mapped to the entities in this drawing:

Can you tell me which certs and which keys are assigned in the controller config and why ?
I already tried to map them in the drawing - but could for example not understand why the “ca” for the controller identity seems to be the whole trust-bundle - meaning all the intermediate certs and not just the “Control Plane Intermediate Certificate”.

BR
Jan

1 Like

I did some documentation a while back.. hope this helps.. it may be a bit out of date

I found a few more comments that I also wrote up below.

Important Notes:
Opening port 8441 allows remote users to connect the Controller REST API. Instructions on how to close this port are included in the Extensions section below, which will disable ZAC. Only commands executed on the controller server via the Ziti CLI will be accepted.

Hi @markamind,
thanks for the hint. Unfortunately I still cant’t fully comprehend how the entities in my drawing and the config of the controller relates. I hope that somebody can enlighten me :wink:
I think if someone sees this the first time it is hard to follow through by just reading textual information :slight_smile:

BR

2 Likes

When I started learning OpenZiti… I started from a blank page… which is why I started with ports / services… as it made it easier to understand.

When you move into certificates and trust… I believe its better to think in

  • control plane
  • data plan
  • certificate signing

When you do this… you will be able to simplify your diagram… it’s probably three different smaller diagrams… where each plane has a different function

Another way to say this … and I am not an expert… is that there is a heirarchy.

top layer of trust… is the controller… its the key to everything

second layer of trust is the edge routers… this is what makes all of the connection

third layer of trust is enrolling new identities… this is what the signing certificate is used for…

I am likely to overlooked a lot of things… but its how I piece the puzzle to together for now… and likely to be revised as I keep learning more :slight_smile:

The drawing is great! I'll highlight a few things and stay focused on correlating the Ziti controller configuration with the cert-manager configuration.

One likely point of confusion about the way the Ziti controller Helm chart uses cert-manager is the self-signed issuer that appears in your drawing as a root CA in the upper-left quadrant. This empty meta entity in cert-manager is the logical parent of every root CA, i.e., self-signed issuers.

Remember that all root CAs are "self-signed" and do not have a cryptographic parent or issuer. Each root CA is, therefore, a "root of trust." I would relabel (or omit) this meta entity, i.e., the empty issuer entity for self-signed, root CAs, from the drawing because it does not contain any cryptographic material (no cert). It exists only in cert-manager.

The "trust anchor" label is debatable. If you meant "root of trust, " I'd only apply the label to the root CA. If you mean "the most trusted CA" from the consumer's perspective, "trust anchor" seems like a good fit. It's accurate that the intermediate CA issues the leaf certificates used as client or server certificates.

You placed a question mark :question: next to the "ctrl plane API or fabric API?" node in your drawing. This is the control plane service provided by the Ziti controller and consumed by routers. This is how the Ziti controller manages the routers, but no fabric/transport data is attributable to individual Ziti services flowing over this path. The (router) control plane service presents to routers the server certificate configured in the controller's identity.server_cert property.

The easiest way to explain how a server certificate becomes bound to a particular TLS server provided by the Ziti controller is to point out the conventional identity configuration stanza.

Your Ziti controller configuration rendered by Helm has one such identity stanza near the top of the YAML file. If no other identity stanzas appear in the controller's configuration YAML, then it will serve as the default server identity for all the TLS servers provided by the controller.

The controller's two types of TLS servers are the routers' ctrl plane service and the list of web[] (HTTP) listeners. Each web listener may be configured to bind any of the controller's web APIs: client, mgmt, fabric, metrics, or health checks, and each web listener may have an identity stanza specifying a distinct server identity. The controller Helm chart creates a separate web identity common to all web listeners. It's enabled by default in Helm value webBindingPki.enabled.

The last bit of PKI configuration is Ziti's built-in edge signer CA in controller property .edge.enrollment.signingCert. This is the CA from which edge client certificates are issued. The default edge signer CA is the intermediate CA from which cert-manager issues the default server identity. The controller's Helm chart enables by default a separate root of trust for the edge signer CA in Helm chart value edgeSignerPki.enabled. This means that Ziti's edge signer CA is the intermediate CA managed by cert-manager named like {{ Helm release name }}-edge-signer-issuer.

Depending on the chart's input values, the controller's Helm chart allows you to have one, two, or three separate roots of trust, i.e., root CAs.

  1. ctrl plane identity (default server identity from the default root of trust)
  2. edge signer
  3. web

For each enabled root, an intermediate CA issues leaf certificates.

The ctrl plane root CA may be another cert-manager namespaced Issuer or ClusterIssuer, not managed by the controller's Helm chart, by specifying Helm value ctrlPlane.alternativeIssuer. This is useful for building Ziti's chain of trust on another system like Istio.

2 Likes

As you observed, the controller's Helm chart relies on trust-manager (a neighboring project to cert-manager) to manage a Bundle resource aggregating the issuer certs (CA certs) that clients of the Ziti controller's TLS servers should trust (Helm template, relevant issue). This bundle is mounted on the controller's container to be referenced in the controller config property identity.ca.

The Ziti controller builds and publishes a trust bundle for edge API clients, including admin CLI and console management clients, edge SDKs, and edge routers, in each edge API at /.well-known/est/cacerts. The contents of the bundle are a de-duplicated aggregation of:

  1. the controller's edge enrollment signer (the CA that issues leaf certs for edge auth)
  2. the bundle of certs referenced by the identity.ca property at the top level of the controller config

I think I did not express myself accurate enough for that question :wink:
What I actually meant is, why does the ctrl-plane-cas-crt, assigned to identity.ca parameter in the ziti-controller config, contain all the certs - meaning ctrl-plane and additionally web & edge related ones ? Shouldn’t there be just the ctrl-plane related certs assigned to the identity.ca config parameter for the control-plane ?

ziti-controller.yaml: |-
...
 identity:
      ca:                   ${ZITI_CTRL_PLANE_CA}/ctrl-plane-cas.crt

Because for web and edge related APIs there are the web[“client”].identity and web[“management”].identity sections, that on the other hand only have their Root CA certificate assigned to web[“client”].identity.ca and web[“management”].identity.ca

    web:
      - name: client
        ...
        identity:
         ...
          ca:          /etc/ziti/web-identity/ca.crt
        ...
        - name: management
         identity:
         ...
          ca:          /etc/ziti/web-identity/ca.crt

Or in other words - from my point of view the content of the ctrl-plane-cas.crt file that is assigned to identity.ca in the ziti-controller config should just be:

  • Control Plane Root CA Certificate
  • Control Plane Intermediate Certificate

I have now updated my drawing and I think it should be pretty complete now - the only thing I’m still unaware of is which cert should be part of the trust bundle and why ? And why this trust bundle is assigned to “identity.ca” - as this is the config parameter for the control plane API. In my understanding the control plane root certificate and control plane immediate certificate should be enough here. (see previous response :confused:

High resolution image is available here

1 Like

Hi @qrkourier,
could you try to explain that ? :slight_smile:

the only thing I’m still unaware of is which cert should be part of the trust bundle and why ? And why this trust bundle is assigned to “identity.ca” - as this is the config parameter for the control plane API. In my understanding the control plane root certificate and control plane immediate certificate should be enough here

Best Regards & thanks a lot
Jan

It's because the ca property in the root/default identity section is special. The controller computes a de-duplicated aggregation of CA certs that edge clients should trust to issue server certificates. Here's an example trust flow.

  1. controller issues a JWT signed by the CA that issues the client API's server certificate
  2. bearer of JWT is able to discover client API by parsing out the issuer URL from the token claim and verify its server certificate by matching the token signature
  3. the client fetches the well-known trust bundle (the computed aggregation of server certificate issuers)
  4. the client thenceforth trusts server certificates issued by the CAs in the well-known bundle

The "client" refers to edge identities/endpoints, edge routers, and edge admins like the CLI, console UI, and Terraform Provider. The server certificates the client might encounter include:

  • controller's web listeners: client API, mgmt API, Prometheus scrape target, health checks
  • router ctrl plane provided by the controller to routers
  • edge listener provided by the routers to identities/endpoints
  • WebSocket listener provided by routers to identities/endpoints
  • transport listener provided by routers to other routers