Separate usage of a CA's pkey from a controller | Feature Request

Hey there!

So after my journey diving deep into OpenZiti I feel like separating the concern of enrolling edge routers by holding pkey to a trusted CA might benefit in terms of additional transparency of this whole black box of managing PKI in OpenZiti.

I propose to exclude usage of a private key to a CA that is used to enroll edge routers by shifting the concern of generating CAs to administrator's hands.
This way we could have a private key for a CA to be held only by HSM and not being stored in a plaintext elsewhere.
And as just third-party CA is used for identity enrollments of clients the (re-)distribution of certs and public certs of CAs is supposed to be a concern of the respective authority.

Though a little disclaimer shall be placed:
My PKI understanding might be different from the perspective of the maintainers. So I might be basically wrong in this case but the rationale that I outlined above makes sense to me. I don't want no software to hold private keys to any of the CA so that if a potential intruder gets a control of a node with a controller it might not be able to produce any harm beyond messing with the settings of the controller itself.
And so that I could have an external access to revoke certificates at will using CRL or OCSP.

I think I understand the ask, but I'm not sure if it really makes anything more secure. Right now, you can manage the full PKI except for the overlay network itself: router to router and router to controller communication. That PKI, the controller maintains. As such, the controller has one key that it uses to sign identities with, so that when routers enroll they have a PKI locally that is capable of attaching to other routers and back to the controller and you can already control the private keys for identities with 3rd party CAs.

You're proposing to allow the overlay network itself to allow routers to enrolly with 3rd party provided keys and certs. I can understand why, I just don't know how much more secure it'll be in practice. You could always setup your overlay network and remove the key from the controller, making it impossible for the controller to sign any new identities. I think that'd accomplish the same goal?

@andrew.martinez what do you think? Is there any plan to allow for the overlay (routers) to have independently provisioned PKI?

I actually ended up talking to Andrew about this because it interested me. At this time, the lift to accommodate this ask will be too substantial of a change to implement in the forseeable future. We do remember these kinds of asks though, so if we ever redesign the way this works, we'll remember that at least someone in the community did ask for it.

And fwiw, I tried to move the key, but it's checked when the controller starts up so that's not an option.

Not what you're looking to hear, I know, but that's the situation at this time I'm afraid. Thanks for taking the time to post the question.

This is a good example of why the control plane CA (the CA that issue routers' certs) and edge signer CAs (the CA that issues certs to endpoints) should both be intermediate CAs (they can be the same CA or separate intermediates from the same root, or separate roots).

That way, if the CA is compromised, you can rotate the intermediate without re-issuing any uncompromised identities. This also means the root CA doesn't need to be on the same host, effectively separating concerns.

The discussion also raises a question for me: can the Ziti controller use private keys from an HSM/TPM the same way as a Ziti router? For example, PARSEC or PKCS#11? If so, then I imagine the controller config would use a hardware key URI instead of the implied file:// scheme for the identity.key and identity.server_key properties, e.g., pkcs11:///usr/local/lib/libykcs11.so?id=${HSM_SLOT}&pin=${HSM_PIN}" for a YubiKey.

This is exactly what I thought at first. But then I realized that edge routers reissue their certificates automatically prior to their expiration. Essentially making using of HSM non-viable at this point. Only if to add something to ZAC to be able to sign CSRs of edge routers explicitly given that there has to be a user presence to use HSM.

Well the separation of concerns in this case only works for endpoints, but for edge routers that enroll themselves with the controller - don't. I want exactly those edge routers certs to be signed manually by the respective authority that holds the CA either root or intermediate.
I see this as being a pain point I would like to work on. To make this whole thing work either like the enrolling of endpoints or somehow else but which accomplishes this. But I need more debates on this.

I made an incorrect statement in my last:

Correction: routers' certs are issued by the same CA as endpoints' certs, the edge signer CA. This is the only CA that Ziti must manage internally (config.yml: edge.enrollment.signingCert [cert, key]).

Summarizing the thread thus far: You understand correctly that Ziti supports authorizing another CA you control to issue endpoint certificates, but not router certificates.

Since the Ziti controller must possess the private key of the edge signer CA that issues routers' certs, the best way to minimize the risk of a compromised CA is to let the edge signer CA be an intermediate CA so you can rotate the private key and cert in the event of a compromise. All endpoints and all routers are configured by way of enrollment to trust the Ziti controller's bundle of root CAs, a.k.a. the "well-known" CA bundle.

To illustrate this scenario, consider three hosts:

  • root CA
  • Ziti controller w/ intermediate edge signer CA
  • a Ziti router

The root CA host is comparatively secured. There's no need for communication with any other host, so it could even be air gapped. The Ziti controller issues endpoint and router certs from the intermediate CA.

In the even of a compromised intermediate CA, the Ziti controller must be replaced. The database and configuration are migrated to the new host. An intermediate CA CSR is signed by a new private key generated on the new controller host. The CSR is copied to the root CA host where the new intermediate CA cert is issued. The new controller is started up with the new intermediate CA cert with the same DNS name as the compromised controller, and all previously existing endpoints and routers automatically find and trust the new controller.

I didn't play this out, but hopefully it's a reasonably lucid and accurate sketch of how a compromised intermediate could be expeditiously replaced in a single controller scenario. I expect the forthcoming HA raft feature will inform how this is done thereafter.

Is it correct that the edge routers and endpoints do not have the intermediate CA in their bundle, but rather the root CA(.well-known) hence the trust still persists for that new controller?

Here is an interesting quote from the core contributor from 2022 from this thread:

Now this whole thing makes sense to me as long as being opinionated.

Getting back to this after a while.
The root ca cert is shipped within a chain to the router when the CSR is being signed by the controller. Correct?
If so when the expiration time is approaching for the respective router's cert, the router would re-enroll it's cert using the controller. And if a potential intruder would like to substitute the whole chain of trust, would a router make sure that the chain of trust stays the same? Specifically the root ca cert is the same as the one that was initially used when issuing the cert?

The root ca cert is shipped within a chain to the router when the CSR is being signed by the controller. Correct?

During initial enrollment, after the controller is verified through signature verification. If verification passes, the ./well-known/est/cacerts endpoint is contacted, which returns a PKCS#7 trust bundle. This trust bundle is stored as the router's identity ca file and contains only trust anchor CAs for the network. After this initial creation, I do not know any behavior updating it.

When a router's certificates are about to expire, it uses its control channel connection to the controller to issue an extension request. This request includes a CSR; the response contains the client and server certificate chains (leaf up to but not including the root) suitable for connecting to other OpenZiti components in the network.

And if a potential intruder would like to substitute the whole chain of trust, would a router make sure that the chain of trust stays the same?

I do not know what you mean by "substitute the whole chain of trust." The trust bundle the controller uses and delivers is read from disk at startup. To change it, one would have to have access to the controller host, be able to write to the configuration, kill the controller, and start the controller. A very noticeable event. At that point, if the old trust anchors were removed, the controller would not trust incoming connections from any other component.

If the bundle were added to, only newly enrolling entities would have it. Even then, I am a little lost at what an attacker would be able to do.

  • They could set up another router with the new trust anchor, but no router would dial that router. Even if the new router dialed other routers, they wouldn't accept the connection as they would not trust the newly added trust anchor as they have the old bundle.
  • Any rogue (i.e., not enrolled conventionally) router would never be dialed or have circuits created across it, nor would the controller accept a control connection with it (as the router isn't known to the controller).

The only thing I think could be possible is that a rogue router could dial a link to a real router enrolled after the controller trust bundle was updated. I am not 100% sure how long that link would remain open, but the controller certainly wouldn't identify it as routable as that would require an active control channel from the rogue router (which will not connect).