General Architecture Questions

First of all, this is a huge thing, excellent thing to integrate at 1 Day of any operation. This is a really nice project that I am definitely going to contribute, especially into documentation and python SDK as it is my current language of choice.
So I have taken a really deep dive into the OpenZiti for the past two weeks(everyday of 5-6hrs straight).
Have watched a lot of videos on both OpenZiti and Netfoundry's channels. Have gone through a bunch of topics in this forum. Took a look at archived repositories of fabric, edge, ziti with controller itself, zac.
So coming here into this topic I would to finish my research with a bunch of questions that I still have. They would mostly be about terminology, as Glossary nor does repos have answered those for me. Or maybe a little architecture designs applied.

  1. What is a difference between a 'Private Ziti Edge Router' and 'Ziti Fabric Router' terminology?
  2. What L7 protocol does control plane use(websockets?)
  3. What's the idea of splitting up the root CA into bunch of intermediates if Ziti controller has to have all of them? Meaning no 'decentralized' approach for intermediates.
  4. Is there any internal architectural overviews of networking, written in UML possibly?
  5. Why Ziti Fabric's documentation(mostly diagrams) is not included into the official documentation?
  6. Is it correct to assume that ziti-tunneler-sdk utilizes ziti-sdk-c to create an interface to communicate with the overlay and provide a good-old underlay interface for services running on the host along with the tunneler?
  7. Why isn't it explicitily stated that ziti-sdk-c create a unix socket to which it sends the data to be consumed by the underlying application?(more general question of just documentation for architecture of sdks per se)
  8. Is it correct to assume that 'Edge' functionality is added onto the Xgress, Xmgmt, Xctrl parts of the 'fabric'?
  9. Is it correct to assume that 'Edge' is just a word for 'exposing something externally'?
2 Likes

Hi @nenkoru, we're glad you're enjoying your zero trust journey with OpenZiti! :slight_smile: You definitely have been on a learning spree and have already asked some great questions, demonstrating you've been doing the research.

  • What is a difference between a 'Private Ziti Edge Router' and 'Ziti Fabric Router' terminology?

Practically not much. It mostly refers to whether or not the router is expected to have other routers link to that router, or if it's only linking to other routers. Those routers are usually deployed in your private IP space, so "private" is a shorthand way of just saying "this router is not expected to have other routers contact it, it will only reach out to other routers to form connections"... BUT these routers usually will have edge functionality enabled, allowing them to accept connections from devices and work as tunnelers. So it's a very broad term and loosely accurate at best.

  • What L7 protocol does control plane use(websockets?)

I don't deal with it enough, I'm pretty sure it's GRPC over TCP (not websockets). I'll ask and if I'm wrong I'll post a correction.

  • What's the idea of splitting up the root CA into bunch of intermediates if Ziti controller has to have all of them? Meaning no 'decentralized' approach for intermediates.

The quickstart is meant to demonstrate what is possible, not what is best practice. You can bring your own PKI if you want and manage everything on your own. You don't need to have > 1 CA at all. You control how your overlay is deployed. The bash quickstart (and docker-based, since they used that bash quickstart) were all originally intended to be learning vehicles and to exercise the functionality OpenZiti provides. They were not intended for people to deploy them "as is". Turns out, most people just deploy them as is. :slight_smile: So If you'd prefer a single root CA and ALL your identities and overlay be driven from that -- sure you can do that, or any mix/match in between.

  • Is there any internal architectural overviews of networking, written in UML possibly?

Not exactly sure. these?

  • Why Ziti Fabric's documentation(mostly diagrams) is not included into the official documentation?

Not sure what you mean exactly. There's a fair bit of information with respect to the fabric documented. Are you looking for something specific?

  • Is it correct to assume that ziti-tunneler-sdk utilizes ziti-sdk-c to create an interface to communicate with the overlay and provide a good-old underlay interface for services running on the host along with the tunneler?

The Ziti Desktop Edge for Mac/Linux/Windows and Ziti Mobile Edge for Android/iOS all work the same, but slightly differently. At the end of the day, each will configure the operating system to send traffic to the OpenZiti tunneler, which is then sent through the overlay. The ziti-tunneler-sdk (ziti-edge-tunnel) uses the C SDK (ziti-sdk-c), yes. A "tun" device is created and assigned an IP in the 100.64.x.x address space and a route (or multiple routes, depending) are added to tell the OS to send traffic to the tun/tunneler. How that tun is created is different per OS but that's the job of the ziti-tunneler-sdk-c, not the job of the ziti-sdk-c since that's "tunneler-specific" functionality that isn't strictly needed in the ziti-sdk-c... (that answer your question?)

  • Why isn't it explicitily stated that ziti-sdk-c create a unix socket to which it sends the data to be consumed by the underlying application?(more general question of just documentation for architecture of sdks per se)

For starters, there most likely will always be gaps in any documentation. The domain socket created is for a UI or a CLI to interact with the tunneler when it's running, not for data. This would be how log levels are set etc. It's also not documented, because it's not intended for 'external use'. You're expected to use the CLI or a UI at this time.

  • Is it correct to assume that 'Edge' functionality is added onto the Xgress, Xmgmt, Xctrl parts of the 'fabric'?

I'm not an expert here but from my recolection, the X-* frameworks/libraries were intended to work as extension points to the underlying APIs. "Edge" functionality was added to the base "controller" through these "X-*" mechanisms. So yes, "edge" is a layer on top of the base in that way. (i think that answers the question)

  • Is it correct to assume that 'Edge' is just a word for 'exposing something externally'?

"Exposing something externally" is how I would put it. I think of "edge" as the combination of:

  • the place traffic originates or ends
  • where traffic enters the openziti fabric

It deals with more with the security and posture of a given connection more than how the traffic gets from one place to the other most efficiently (that'd be the fabric part). So it's not "exposed" per-se. you can have a totally private peer-to-peer style connection like from my computer to my mother's house. There nothing is "exposed" per-se, but I can form that secure tunnel easily because I have an edge component on my local computer (a tunneler) and that tunneler can connect to an "edge router" to form one side of the connection. Hopefully that's clear enough...

Hope that helps

Thanks for replying!

I mean this fabric/docs/concepts.md at p14_c2 · openziti/fabric · GitHub
For me it was a definitive concept overview, especially with control plane specific for openziti. And I see in videos and some parts of documentation refer to 'Control Plane' but do not define it.

So like I could create my own pki using good old openssl and hold the keys for the CAs? Isn't it obligatory to provide a whole pki for controller's usage? So that it creates and signs identities for instance.

Well the first one with Sessions and Connections does explain the sequence it takes to perform the connection. But the thing is I haven't found a general top-level overview in a form of a diagram of how communication is done, which ports are accessed by whom, which ports are exposed. There are pieces of it here and there, for instance the Host it Yourself guide, and docker-compose one do have some overview of ports being open for the communication. But it's unclear to me who does what with those ports.

Sorry, perhaps I was a little bit unclear with this. What I meant is that this whole 'without exposing a port' narrative feels too much magical. It is barely documented, on how exactly this being achieved. And by scimming the code and some parts of the documents I found out that at least for python sdk, that yes, it opens an outbound connection, does all the heavy lifting without a need to utilize a local tunneler, and just creates a local unix socket for the application that utilizes the sdk to listen for the ingress traffic. And it was the turning point for me to grasp the magicity of this whole 'not exposing a port' thing. Hence, the root question.

Hope that my questions don't look as mean as they might look. I don't mean to diss the documentation. Just a little bit too straight with questions.

1 Like

Still relevant. Up!!

1 Like
  1. So like I could create my own pki using good old openssl

Yes you could. We haven't documented how to achieve this yet since ziti pki covers thing, but it's been something we have talked about and planned, just not the most pressing need since very few people want to deal with this on their own

  1. Isn't it obligatory to provide a whole pki for controller's usage? So that it creates and signs identities for instance.

Yes it is. The OpenZiti controller must be able to sign certificates for edge routers at least. If you want OpenZiti to maintain the PKI on your behalf, the controller must have a keypair it can use to sign CSRs. That keypair is used for routers during router enrollment, and for identities that are not backed by a 3rd party CA.

This second part is important because for other identities (users/sdk apps/whatever non-routers), you have at least two options. You can use 3rd party CAs where you create and distribute the keypair to/for your identities (using good ol openssl or whatever), or you can use external jwt signers if you're making your own app. Our tunnelers dont' support external jwt signers just yet, but they will at some point for sure. (an external JWT signer would be used with things like SPIRE, or OAUTH/PKCE etc).

Both of these comments are related, I think. I really appreciate the "too magical" comment and I can very much see where you're coming from. Your comment here inspires me to make our documentation a bit more clear and crisp here. It's really easy to think "oh this is super clear" -- when you're so close the project! I can't promise when, but I can promise this comment has been taken to heart by me! :slight_smile:

Do you still need an overview of what the ports do, how they are used, where and why? Or are you all set there? Do you think something like this is useful? Do you think something like that should appear in the docs?

Not to worry - we appreciate your input and you have clearly done your best to find answers on your own!

1 Like

This is definitely useful and answered a bunch of questions of mine. This definitely should appear in a doc, potentially a reference to a blog post, or an article where this is written and video attached.
So now I understand that a controller only needs to own two server certs and one CA to create identities.
Questions:

  1. Are those routers in a video two separate ones? Or just one but splitted into two for the sake of visibility?
  2. Is it correct that a client and router must establish mTLS? And they use their certificates both signed by the same CA to trust each other and establish the connection?
  3. Is it correct that those IPs and DNS names that are included in the SAN of a server certificate are about the server where controller is running? So that peers(client and routers) could verify it?
  4. Is it correct that the ziti sdk for applications establishes an outbound bidirectional connection with a router and uses a unix socket to let those ingress and egress bytes be handled on the user-space(business logic) application level?
  5. Is it possible to manually create each certificate, including identity? Consider the following diagram.
1 Like

Could be either really. It was meant to be a logical representation. Since the router on the left looks to be connecting/forming a link to the router on the right:

I would presume this is two routers. The router on the left has a dialer but no listener so it'd be a router in private networking space. That router connects to the router on the right so the router on the right could be in the same private address space, or it could be in some other address spaces addressable from the router on the left. In your diagram that'd be the "Ziti Router Private" connecting to the "Ziti Router Public" (a line you don't have shown yet).

Mostly, yes and in nearly all regular/normal setups this is the case but it's not necessary for it to be the same CA, the router just needs to trust the certificate presented. So if the router was setup to have multiple trusted CAs, then it wouldn't have to be the same ca. To me, it's the 99+% use case/expectation that they'll be the same CA since that's how the quickstart works and frankly it's just easier. I don't know of anyone using > 1 CA but I'd expect it's possible and would work (i've never tested this either) :).

Yes. The IP/DNS entries in the SANS field are explicitly so clients connecting to that port can verify the receiving port. It's not just 'controller', that is true for the routers as well and anything else accepting TLS traffic.

Well, not exactly. I feel like maybe you're mixing up what a tunneler does and what the SDKs do. The SDKs do open an outbound connection to some rendezvous server, which are "edge-enabled" routers. Once that TCP connection is established, there's a bidirectional path available and the SDK just takes your payloads and writes them right to the stream (and reads them)... So for that part - yes.

Now, the other half of your question here sounds more like tunnelers. Tunnelers are purpose-built SDK apps that are able to instruct the OS through various mechanisms to have the OS send IP packets to the tunneler. The tunneler then takes the payloads and sends them over OpenZiti. I don't believe it's through "unix sockets" but to be fair I don't truly know for sure. On linux we use IP routes and a TUN and we can use tproxy and there is an eBPF implementation which I know very little about. I don't believe (but I don't know for sure) any of those use a 'unix socket'. Maybe someone else in the community knows with certainty? There's also Windows and MacOS at play and Android/iOS too... To me, what's important is that the tunnelers are all capable of getting IP packets sent to them, and the SDK reads those packets and sends the payloads over OpenZiti.

Eh -- for identities yes. For routers, maybe? I thinks so??? With the diagram note you added: "Doesn't posess any keys to any CAs" -- for identities, this means you will need to use 3rd party CAs since the controller won't be able to sign any identities on its own. That also means the controller won't be able to enroll routers either. I'm NOT sure if we support 3rd party CAs for routers yet. I don't think we do. @andrew.martinez I think needs to answer that one

1 Like

I see. Thanks for the answers. Now its more clear for me.
My questions about all the CAs are about hardening the trust further. By using some HSM like Yubikey and explicitly signing new certs for each parts of the infrastructure by this HSM.
I know that it is possible for authorizing endpoints as described in the docs, but I wondered if it possible to go further with this to moving the control of private keys of CAs to the HSM. So that even if controller could be compromised, the attack surface of that could be a little lower. I pretty much understand that if controller node is hijacked you have many more other problems than just posession of CAs, but yeah, just curious.

1 Like

Well. You know -- you could offline the key used to sign identities after getting your overlay and identities all setup by removing them and "put them back" afterward. I never considered that but as long as the controller isn't trying to sign any CSRs etc, I think that'd work but I've never tried it.

I am pretty sure the router/controllers don't work with HSMs yet but that seems like a logical thing for a future feature enhancement.

1 Like

So, is this correct?
The only thing that I miss here. Does 'Ziti Router Private' has an explicit trust for the cert of the 'Ziti Router Public', or does it have trust for a 'Ziti Edge CA' and so a descendant cert is trusted?
And a little bit unsure which cert from which CA goes into ':8440' and ':8441' controller's port.

1 Like

When a router dials the link listener of another router or when the router connects to the control plane of the controller, the certificate the router presents needs to be trusted. That trust is configured through the identity.ca section of the config file.

For example, if you start a controller and two routers, you can use openssl to connect. Port 6262 or port 8440 are the ports that often represent the controller's control plane. Ports 10080 or 8442 are the ports that you'll find from the quickstarts for the fabric link listener port.

From Router1, connect to the control plane (6262/8440)

openssl s_client -connect ziti-controller:6262 \
    -cert /persistent/ziti-edge-router-1.cert \
    -key /persistent/ziti-edge-router-1.key \
    -CAfile /persistent/ziti-edge-router-1.cas < /dev/null

From Router1, connect to Router2's fabric link listener (10080/8442)

openssl s_client -connect ziti-edge-router-2:10080 \
    -cert /persistent/ziti-edge-router-1.cert \
    -key /persistent/ziti-edge-router-1.key \
    -CAfile /persistent/ziti-edge-router-1.cas < /dev/null

As for which cert goes where, that's actually where the quickstart's PKI is useful to explore. A controller has two main identity sections. One at the top/root level and another located in the web.name.identity section. The top identity block is for the control plane (port 8440) while the web.name.identity section overrides the root identity section and is applied to just the HTTP-based port/s (8441). You don't need to use both identity blocks, the quickstart does it this way to try to show you that you can have a separate PKI for the HTTP endpoints located in the 'web' section...

Hope that all makes sense

1 Like

Alrighty, seems like this look better now:

Is there anything that I miss here?

1 Like

Top right callout:
The "privateness" of an edge router to me, mainly refers to whether they have exposed ports. When you make a "private" edge router config, the edge WILL be enabled. This is because these edge routers are very often used as egress/exit nodes for traffic, which means they act as a tunneler, which means they need to have the edge functionality enabled.

On top of that, it's often useful to have one or more routers local so that local OpenZiti traffic doesn't need to go out over the internet and back, it can remain local, and thus will obviously have less latency.

So I think that's the only tweak I'd make .

2 Likes

So that 'Ziti Router Private' just has ':8442(:3022 default)' port opened to the private address space, and doesn't have :10080 port opened, so that other routers can't connect to it. Correct?
If so, what is the differece between an edge router and fabric router?. In cli they have distinct names, but essentially edge router is just a superset of a fabric router. Above you said that these fabric router 'usually will have edge functionality enabled' but what's the point if it doesn't? What if a router doesn't have any ports opened(no edge basically) and it can only dial the other router?

1 Like

I don't think I can answer this question, let me turn it around... A router config file made with --private will not have a link listener and will have no link listener port defined, thus there will be no port open locally OR publicly for routers (even local routers) to connect TO. If you want a local mesh like that, you'd not use the --private flag, and you'd rely on your firewall/s to block that port preventing other devices from tryign to connect to thatt port... I hope that makes sense...

It's easy and complex and nuanced all at once.... It comes down to the router's config file. If the router has a link dialer or a link listener, then the router participates in the fabric and is a "fabric" router. This is probably every router. So, a slightly more refined way of thinking of this is that a "fabric" router that does NOT have edge functionality enabled is only considered a "fabric" router.

If the router has "edge" enabled services AND that router has link listeners/dialers defined, well then it's (per the definition above) should be considered a "fabric" router. But really, an 'edge router' implies more functionality than just the fabric. So in general, if the edge functionality is enabled we call those "edge routers" to indicate that it's "maybe a fabric router, but definitely an edge router".

It's confusing terminology that makes sense the deeper you're into OpenZiti and it's one of the reasons you'll find me usually only using the term (or trying to) 'router' now-a-days.

The ziti create config commands are not meant to cover every single permutation of router configuration available, it's just too many permutations to cover for us at this time. Instead they are intended to provide a foundational understanding of what these config files look like with some customization options. Largely through environment variables. There are a few flags that influence some settings, but undoubted, we'll never cover all of the permutations...

As for the differences... I'd recommend you give them a try and see.

For example:

diff \
    <(ziti create config router edge --routerName test) \
    <(ziti create config router edge --routerName test --private)

returns:

<   listeners:
<     - binding:          transport
<       bind:             tls:0.0.0.0:10080
<       advertise:        tls:sg3u22:10080
<       options:
<         outQueueSize:   4
---
> #  listeners:
> #    - binding:          transport
> #      bind:             tls:0.0.0.0:10080
> #      advertise:        tls:sg3u22:10080
> #      options:
> #        outQueueSize:   4

Here you can see the "--private" flag simply added comments to the link listener section, thereby disabling link listeners

fabric configs have much more commented out. the

diff \
    <(ziti create config router fabric --routerName test) \
    <(ziti create config router edge --routerName test --private)
18,23c18,23
<   listeners:
<     - binding:          transport
<       bind:             tls:0.0.0.0:10080
<       advertise:        tls:sg3u22:10080
<       options:
<         outQueueSize:   4
---
> #  listeners:
> #    - binding:          transport
> #      bind:             tls:0.0.0.0:10080
> #      advertise:        tls:sg3u22:10080
> #      options:
> #        outQueueSize:   4
25c25
< #listeners:
---
> listeners:
27,35c27,35
< #  - binding: edge
< #    address: tls:0.0.0.0:3022
< #    options:
< #      advertise: sg3u22:3022
< #      connectTimeoutMs: 5000
< #      getSessionTimeout: 60
< #  - binding: tunnel
< #    options:
< #      mode: host #tproxy|host
---
>   - binding: edge
>     address: tls:0.0.0.0:3022
>     options:
>       advertise: sg3u22:3022
>       connectTimeoutMs: 5000
>       getSessionTimeout: 60
>   - binding: tunnel
>     options:
>       mode: host #tproxy|host
38,47d37
< csr:
<   country: US
<   province: NC
<   locality: Charlotte
<   organization: NetFoundry
<   organizationalUnit: Ziti
<   sans:
<     dns:
<       - localhost
<       - sg3u22
49,50c39,52
<     ip:
<       - "127.0.0.1"
---
> edge:
>   csr:
>     country: US
>     province: NC
>     locality: Charlotte
>     organization: NetFoundry
>     organizationalUnit: Ziti
>     sans:
>       dns:
>         - localhost
>         - sg3u22
>
>       ip:
>         - "127.0.0.1"
2 Likes

Now it makes sense to me. Really is. I think this explanation has to make to the docs. Because I haven't seen as good explanation as this one anywhere. Apart of pieces in the old doc of fabric repo I mentioned above, some videos, and implicitly mentioned pieces of it in the doc.
I think also the definition of Edge has also make it to the docs.
Thank you, Clint!

edit:
and just in case the source of the diagram is needed

1 Like

I don't understand for what client certificates are created and added into identity.cert and web.identity.cert sections.
Is it correct that identity.cert and web.identity.cert respectively is ONLY used by a ziti cli?
Like path parsed from the config file if provided, maybe?
In the video above where you talk about PKI you don't really go into details of what client certs are used for. Except for 'sometimes you need to present a certificate to a piece of equipment'. I tried searching both in the documentation and forums here and could not find the answer.

edit:
found in a recent video on Ziti TV where you go step by step you mention that identity.cert would be used to connect to other controller, so I presume it's kind of reserved for the upcoming HA feature, or when controller initates a connection itself to some machine on the network?
but what about the web.identity.cert which is used for Edge API, is it the same type of thing?

1 Like

I expect you are constraining your question to controllers only because for routers, that .cert field is used all the time when the router contacts the controller or other routers... Currently for controllers both of those cert field are never used. The simple answer is this is a reused config block and although the fields aren't used at this time by the controller, they might be in the future (as you discovered after i read your 'edit'). But for now, yes, those fields are unused by controllers. Definitely used by routers though....

It's not used by the ziti CLI at all as far as I know. The ziti CLI uses information stored in $HOME/.config/ziti-config.json or you use the certs/keys specifically during ziti edge login.

I suspect this field is literally never used. I would expect when we support > 1 controller that the root identity section (the one I think of as controlling/configuring the 'fabric', not the 'edge') would be the one in use but to be 100% clear, I've not dug into how HA controllers work just yet myself, so it's only an educated guess on my part at this time...

1 Like

Makes sense. Yes, it was about controller's config. Forgot to mention.
Thanks!!

1 Like