OpenSSL Certificate Creation Router Startup Error

I attempted to use OpenSSL to simulate the creation of PKI certificates. Currently, the controller can start normally, and ZAC is able to manage it as well. However, I created an AP-router, and after successfully registering it, an error occurred during startup. I'm not sure if it's due to an issue with the certificates, but if there were a problem with the certificates, why doesn't the controller report any errors?

Here is the error log when the router starts:

root@ip-10-111-0-5:/opt/zt# ztbin/ziti router run /opt/zt/ap-router.yaml
[   0.016]    INFO ziti/ziti/router.run: {revision=[0eec47ce3c80] build-date=[2024-10-02T12:59:41Z] routerId=[VefibBTkJx] configFile=[/opt/zt/ap-router.yaml] go-version=[go1.23.1] os=[linux] arch=[amd64] version=[v1.1.15]} starting ziti router
 ....
[   2.777]   ERROR ziti/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {endpoint=[tls:zt.demo.org:8440] error=[error connecting ctrl (EOF)]} unable to connect controller
[   3.428]   ERROR ziti/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {error=[error connecting ctrl (EOF)] endpoint=[tls:zt.demo.org:8440]} unable to connect controller
[   4.900]   ERROR ziti/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {endpoint=[tls:zt.demo.org:8440] error=[error connecting ctrl (EOF)]} unable to connect controller
[   7.587]   ERROR ziti/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {error=[error connecting ctrl (EOF)] endpoint=[tls:zt.demo.org:8440]} unable to connect controller
[   9.637]   ERROR ziti/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {error=[error connecting ctrl (EOF)] endpoint=[tls:zt.demo.org:8440]} unable to connect controller
[  12.113]   ERROR ziti/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {endpoint=[tls:zt.demo.org:8440] error=[error connecting ctrl (EOF)]} unable to connect controller
[  17.532]   ERROR ziti/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {endpoint=[tls:zt.demo.org:8440] error=[error connecting ctrl (EOF)]} unable to connect controller
^C[  18.793]    INFO ziti/ziti/router.waitForShutdown: shutting down ziti router
[  18.794]    INFO transport/v2/tls.(*sharedListener).runAccept [tls:0.0.0.0:10080]: {error=[accept tcp [::]:10080: use of closed network connection]} listener closed, exiting
[  18.794]    INFO transport/v2/tls.(*sharedListener).runAccept [tls:0.0.0.0:10080]: exited
[  18.794]    INFO ziti/router/link.(*linkRegistryImpl).Shutdown: {linkCount=[0]} shutdown links in link registry
[  18.794]    INFO transport/v2/tls.(*sharedListener).runAccept [tls:0.0.0.0:8442]: {error=[accept tcp [::]:8442: use of closed network connection]} listener closed, exiting
[  18.794]    INFO transport/v2/tls.(*sharedListener).runAccept [tls:0.0.0.0:8442]: exited
[  18.794]   ERROR agent.(*handler).listen: {error=[accept unix /tmp/gops-agent.22579.sock: use of closed network connection]} error accepting gops connection, closing gops listener
[  18.794]   ERROR agent.(*handler).listen.func1: {error=[close unix /tmp/gops-agent.22579.sock: use of closed network connection]} error closing gops listener
[  18.794]   ERROR ziti/router/forwarder.(*Faulter).run: exited
[  18.794] WARNING ziti/router/forwarder.(*Scanner).run: exited
[  18.794]   ERROR ziti/router/xgress_edge.(*Acceptor).Run: error accepting (closed)
[  18.794] WARNING ziti/router/xgress_edge.(*Acceptor).Run: exiting
^C^C^C[  25.795]   ERROR ziti/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {endpoint=[tls:zt.demo.org:8440] error=[error connecting ctrl (EOF)]} unable to connect controller

I briefly checked the information:

root@ip-10-111-0-5:/opt/zt# telnet zt.demo.org 8440
Trying X.X.X.X...
Connected to zt.demo.org.
Escape character is '^]'.
^]
telnet> quit
root@ip-10-111-0-5:/opt/zt# telnet zt.demo.org 8441
Trying X.X.X.X...
Connected to zt.demo.org.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
root@ip-10-111-0-5:/opt/zt# curl -k https://zt.demo.org:8441
{"data":{"apiVersions":{"edge":{"v1":{"apiBaseUrls":["https://zt.demo.org:8441/edge/client/v1"],"path":"/edge/client/v1"}},"edge-client":{"v1":{"apiBaseUrls":["https://zt.demo.org:8441/edge/client/v1"],"path":"/edge/client/v1"}},"edge-management":{"v1":{"apiBaseUrls":["https://zt.demo.org:8441/edge/management/v1"],"path":"/edge/management/v1"}}},"buildDate":"2024-10-02T12:59:41Z","capabilities":[],"revision":"0eec47ce3c80","runtimeVersion":"go1.23.1","version":"v1.1.15"},"meta":{}}


root@ip-10-111-0-5:/opt/zt# cat ap-router.yaml
v: 3
identity:  cert:             "/opt/zt/ap-router.cert"
  server_cert:      "/opt/zt/ap-router.server.chain.cert"
  key:              "/opt/zt/ap-router.key"
  ca:               "/opt/zt/ap-router.cas"
  #alt_server_certs:
  #  - server_cert:  ""
  #    server_key:   ""

ctrl:
  endpoint:             tls:zt.demo.org:8440

link:
  dialers:
    - binding: transport
  listeners:
    - binding:          transport
      bind:             tls:0.0.0.0:10080
      advertise:        tls:zt.demo.org:10080
      options:
        outQueueSize:   4

listeners:
# bindings of edge and tunnel requires an "edge" section below
  - binding: edge
    address: tls:0.0.0.0:8442    options:      advertise: zt.demo.org:8442      connectTimeoutMs: 5000
      getSessionTimeout: 60
  - binding: tunnel
    options:
      mode: host #tproxy|host



edge:
  csr:
    country: US
    province: NC
    locality: Charlotte
    organization: NetFoundry
    organizationalUnit: Ziti
    sans:
      dns:
        - localhost
        - zt.demo.org

      ip:
        - "127.0.0.1"
        - "::1"



#transport:
#  ws:
#    writeTimeout: 10
#    readTimeout: 5
#    idleTimeout: 120
#    pongTimeout: 60
#    pingInterval: 54
#    handshakeTimeout: 10
#    readBufferSize: 4096
#    writeBufferSize: 4096
#    enableCompression: true

forwarder:
  latencyProbeInterval: 0
  xgressDialQueueLength: 1000
  xgressDialWorkerCount: 128
  linkDialQueueLength: 1000
  linkDialWorkerCount: 32


root@ip-10-111-0-5:/opt/zt# cat ap-router.yaml
v: 3
identity:  cert:             "/opt/zt/ap-router.cert"
  server_cert:      "/opt/zt/ap-router.server.chain.cert"
  key:              "/opt/zt/ap-router.key"
  ca:               "/opt/zt/ap-router.cas"
  #alt_server_certs:
  #  - server_cert:  ""
  #    server_key:   ""

ctrl:
  endpoint:             tls:zt.demo.org:8440

link:
  dialers:
    - binding: transport
  listeners:
    - binding:          transport
      bind:             tls:0.0.0.0:10080
      advertise:        tls:zt.demo.org:10080
      options:
        outQueueSize:   4

listeners:
# bindings of edge and tunnel requires an "edge" section below
  - binding: edge
    address: tls:0.0.0.0:8442    options:      advertise: zt.demo.org:8442      connectTimeoutMs: 5000
      getSessionTimeout: 60
  - binding: tunnel
    options:
      mode: host #tproxy|host



edge:
  csr:
    country: US
root@ip-10-111-0-5:/opt/zt# cat zt.demo.org.yaml
v: 3
db:                     data/ctrl.db
identity:
  cert:        /opt/zt/pki/zt.demo.org-mid/certs/zt.demo.org-client.chain.pem
  server_cert: /opt/zt/pki/zt.demo.org-mid/certs/zt.demo.org-server.chain.pem
  key:         /opt/zt/pki/zt.demo.org-mid/keys/zt.demo.org-server.key
  ca:          /opt/zt/pki/cas.pem


ctrl:
  options:
    advertiseAddress: tls:zt.demo.org:8440
  listener:             tls:0.0.0.0:8440
healthChecks:
  boltCheck:
    interval: 30s
    timeout: 20s
    initialDelay: 30s
edge:
  api:
    sessionTimeout: 30m
    address: zt.demo.org:8441
  enrollment:
    signingCert:
      cert: /opt/zt/pki/signing.pem
      key:  /opt/zt/pki/zt.demo.org-signing-mid/keys/zt.demo.org-signing-mid.key
    edgeIdentity:
      duration: 180m
    edgeRouter:
      duration: 180m
web:
  - name: client-management
    bindPoints:
      - interface: 0.0.0.0:8441
        address: zt.demo.org:8441
    identity:
      ca:          /opt/zt/pki/zt.demo.org-edge-controller-root-ca/certs/zt.demo.org-edge-controller-root-ca.cert
      key:         /opt/zt/pki/zt.demo.org-edge-controller-mid/keys/zt.demo.org-server.key
      server_cert: /opt/zt/pki/zt.demo.org-edge-controller-mid/certs/zt.demo.org-server.chain.pem
      cert:        /opt/zt/pki/zt.demo.org-edge-controller-mid/certs/zt.demo.org-client.chain.pem

    options:
      readTimeout: 5000ms
      writeTimeout: 100000ms
      minTLSVersion: TLS1.2
      maxTLSVersion: TLS1.3
    apis:
      - binding: edge-management
        options: { }
      - binding: edge-client
        options: { }
      - binding: fabric
        options: { }

OpenSSL Certificate Creation Method:

Root Certificate

openssl genrsa -out "$CA_KEY" $KEY_SIZE || exit_on_error "Root CA key generation"
openssl req -new -x509 -days $DAYS -key "$CA_KEY" -out "$CA_CTR" -subj "$SUBJ" \
    -extensions v3_ca -config <(cat /etc/ssl/openssl.cnf <(printf "[v3_ca]\nkeyUsage=critical,cRLSign,keyCertSign\nbasicConstraints=critical,CA:TRUE")) || exit_on_error "Root CA certificate generation"

Grandparent Certificate and Intermediate Certificate

openssl genrsa -out "$CA_KEY" $KEY_SIZE || exit_on_error "CA key generation"
openssl req -new -key "$CA_KEY" -out "$CA_CSR" -subj "$SUBJ" || exit_on_error "CA CSR generation"

openssl x509 -req -in "$CA_CSR" -CA "$PARENT_CTR" -CAkey "$PARENT_KEY" -CAcreateserial -out "$CA_CTR" -days $DAYS \
    -extfile <(printf "keyUsage=critical,cRLSign,keyCertSign\nbasicConstraints=critical,CA:TRUE,pathlen:$PATH_LEN")) || exit_on_error "CA certificate signing"

Server and Client Certificates (They Share a Key)

openssl req -new -key "$KEY" -out "$CSR" -subj "$SUBJ" || exit_on_error "CSR generation for $TYPE"
openssl x509 -req -in "$CSR" -CA "$MID_CTR" -CAkey "$MID_KEY" -CAcreateserial -out "$CTR" -days $DAYS \
    -extfile <(printf "keyUsage=critical,digitalSignature,keyEncipherment\nextendedKeyUsage=serverAuth$ALTNAME") || exit_on_error "Certificate signing for $TYPE"

Chain Structure
client/server -- mid -- gp -- root-ca

Hi @toadzhou, did you find the github repo that shows you exactly what to do and how? Along with the corresponding video?

Originally from forum member @nenkoru - you can find the original here GitHub - nenkoru/openziti_manual_pki: Bootstrap PKI for OpenZiti manually. I forked it and provided an example recently to someone on how to setup a public and two private routers. see my PR if interested: working 3 router setup by dovholuknf · Pull Request #2 · nenkoru/openziti_manual_pki · GitHub

Here's the video where I walk through and do all the steps

I think between those resources you'll get things working.

If you HAVE seen / used those -- let me know. I didn't look too hard into what you did because this is a complex topic and takes a lot of time to scrutinize... I was hoping maybe you hadn't seen them and they'll answer your question for you? :slight_smile:

Okay, I'll take some time to understand it first. Thank you for your guidance, much appreciated!

Whooops that's the wrong video. I meant to include this one! :slight_smile: but of course, both are worth watching :slight_smile:

Thank you! I've finished watching the tutorial video, and the explanation was really excellent.

I encountered some errors during the experiment. Since it was just a test, I did not make any changes and simply followed the steps one by one. The error messages are as follows:

root@ip-10-111-0-5:/opt/zt1/openziti_manual_pki# ziti version
v1.1.15
root@ip-10-111-0-5:/opt/zt1/openziti_manual_pki# ziti controller edge init controller_config.yaml -u "admin" -p "admin"
panic: could not generate default trust domain: error generating default trust domain from root CA: no root CA detected after chain assembly from the root identity server cert and ca bundle

goroutine 1 [running]:
github.com/openziti/ziti/controller/config.LoadConfig({0x7ffcc4d687a7?, 0x1577d5b?})
        github.com/openziti/ziti/controller/config/config.go:383 +0x3354
github.com/openziti/ziti/controller/subcmd.configureController({0x7ffcc4d687a7, 0x16}, {0x43617a0, 0x5e32900})
        github.com/openziti/ziti/controller/subcmd/init.go:147 +0x45
github.com/openziti/ziti/controller/subcmd.NewEdgeInitializeCmd.func2(0xc000aa4600?, {0xc00039e280?, 0x4?, 0x3bfcb7c?})
        github.com/openziti/ziti/controller/subcmd/init.go:76 +0x4b
github.com/spf13/cobra.(*Command).execute(0xc000aaa608, {0xc00039e190, 0x5, 0x5})
        github.com/spf13/cobra@v1.8.1/command.go:989 +0xa91
github.com/spf13/cobra.(*Command).ExecuteC(0x5cc00e0)
        github.com/spf13/cobra@v1.8.1/command.go:1117 +0x3ff
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.8.1/command.go:1041
github.com/openziti/ziti/ziti/cmd.Execute()
        github.com/openziti/ziti/ziti/cmd/cmd.go:83 +0x1a
main.main()
        github.com/openziti/ziti/ziti/main.go:53 +0xf

After reviewing the video again, I found that the issue was related to the version. The latest version of Ziti throws this error, but when I used version v1.0.0 as shown in the video, everything worked fine. I suspect that some certificate validation rules were changed in the newer version.

root@ip-10-111-0-5:/opt/zt1/openziti_manual_pki# ./ziti version
v1.0.0
root@ip-10-111-0-5:/opt/zt1/openziti_manual_pki# ./ziti controller edge init controller_config.yaml -u "admin" -p "admin"
[   0.014]    INFO ziti/controller/db.RunMigrations.(*migrationManager).Migrate.func1: Migrated edge datastore from 0 to 36
[   0.014]    INFO ziti/controller/db.RunMigrations.(*migrationManager).Migrate.func1: edge datastore is up to date at version 36
[   0.022]    INFO ziti/common/metrics.ConfigureGoroutinesPoolMetrics.GoroutinesPoolMetricsConfigF.func1.1: {idleTime=[30s] poolType=[pool.router.messaging]
minWorkers=[0] maxWorkers=[100] maxQueueSize=[100]} starting goroutine pool
[   0.023]    INFO ziti/controller/network.(*Network).showOptions: network = {
  "CreateCircuitRetries": 2,
  "CycleSeconds": 60,
  "EnableLegacyLinkMgmt": false,
  "InitialLinkLatency": 65000000000,
  "IntervalAgeThreshold": 0,
  "MetricsReportInterval": 60000000000,
  "MinRouterCost": 10,
  "PendingLinkTimeout": 10000000000,
  "RouteTimeout": 10000000000,
  "RouterConnectChurnLimit": 60000000000,
  "RouterComm": {
    "QueueSize": 100,
    "MaxWorkers": 100
  },
  "Smart": {
    "RerouteFraction": 0.02,
    "RerouteCap": 4,
    "MinCostDelta": 15
  }
}
[   0.023]    INFO ziti/controller.(*Controller).showOptions: ctrl = {
  "OutQueueSize": 4,
  "MaxQueuedConnects": 1,
  "MaxOutstandingConnects": 16,
  "ConnectTimeout": 5000000000,
  "DelayRxStart": false,
  "WriteTimeout": 0,
  "NewListener": null,
  "AdvertiseAddress": {},
  "RouterHeartbeatOptions": {
    "sendInterval": 10000000000,
    "checkInterval": 1000000000,
    "closeUnresponsiveTimeout": 30000000000
  },
  "PeerHeartbeatOptions": {
    "sendInterval": 10000000000,
    "checkInterval": 1000000000,
    "closeUnresponsiveTimeout": 30000000000
  }
}
[   0.799]    INFO ziti/controller/server.NewController: edge controller instance id: cm2lmnczo0000g1lzya3pua0t
[   0.799]    INFO ziti/controller/server.(*Controller).Initialize: initializing edge
[   0.804]    INFO ziti/controller/internal/policy.NewSessionEnforcer: {sessionTimeout=[30m0s] frequency=[5s]} session enforcer configured
[   0.820]    INFO ziti/controller/server.(*Controller).Shutdown: edge controller: shutting down...
[   0.820]    INFO ziti/controller/server.(*Controller).Shutdown: edge controller: stopped
[   0.820]    INFO ziti/controller/server.(*Controller).Shutdown: fabric controller: shutting down...
[   0.820]    INFO ziti/controller/server.(*Controller).Shutdown: fabric controller: stopped
[   0.820]    INFO ziti/controller/server.(*Controller).Shutdown: shutdown complete
[   0.820]    INFO ziti/controller/subcmd.NewEdgeInitializeCmd.func2: Ziti Edge initialization complete

Oh interesting. Yeah that was a relatively recent change. You can fix that in two ways. You can add a configuration options for the domain in the config file (it's expected to basically be a value that is a URL, see the changelog for more detail). Or, you can configure the controller with client cert chain instead of a singular file. Make a chain file terminating with the leaf cert and use that in your identity block. Doing that will allow the controller to determine this value via the pki itself.

Hope that helps and I'm glad the video and repo helped!

I only tried one of the two methods you provided, and I didn’t quite understand how to use the other method. After adding "trustDomain: zt.demo.org" in the controller configuration file, the controller was able to run normally. Subsequently, I created a router, which also started successfully, but I encountered a minor error: "system resolver test failed: failed to resolve ziti-tunnel.resolver.test: lookup ziti-tunnel.resolver.test on 127.0.0.53:53: no such host". I’m not sure if this will affect usage.

It has no effect whatsoever in most cases.

When the router starts, regardless of the mode it starts in (an issue i am sure we will fix one day), it will attempt to probe the OS as to whether or not the router can work in an 'intercepting tunneler mode'. If it can, and if you use the tproxy mode, you can use the router as a tunneler. I would assert most people do not run the router in tproxy mode, instead they run it in host mode. When running in host mode, the router shouldn't be doing this probe (because it's not configured as an intercepting, tunneling router).

Got it, I have found a related historical discussion about this error here: https://openziti.discourse.group/t/ziti-tunnel-resolver-test/2934. Although it doesn’t affect my usage, it helped clear up some of my doubts.

The method you mentioned might still be a bit unclear to me. Could you please provide me with some hints? Thank you!

In the main identity block, ensure you utilize chains for the cert and server cert. Right now you probably have

identity:
  # Used for *TLS*!
  cert: ./pki/end_certs/openziti_network_components_client.cert
  server_cert: ./pki/end_certs/openziti_network_components-server.chain.pem
  key: ./pki/end_certs/openziti_network_components_certs.key
  ca: ./pki/cas/openziti_network_components_cas.pem

Or something quite similar. See the cert and server cert fields and how one points to a chain of files and one points to a singular file? I believe the client cert needs to be confirmed with a chain now. This is a recent change. OpenZiti quickstarts and deployments have been adapted but any older stuff (like the repo) hasn't been updated.

I updated my fork with the proper steps. You can run them all again ( it's quite fast to run all the commands ) openziti_manual_pki/BOOTSTRAP_PKI.md at main · dovholuknf/openziti_manual_pki · GitHub

Or you can simply:

cat ./pki/end_certs/openziti_network_components_server.cert \
    ./pki/cas/openziti_network_components_ica.cert \
    ./pki/cas/openziti_ica.cert \
    ./pki/root_ca.cert > ./pki/end_certs/openziti_network_components-server.chain.pem

then update your identity block with:

  cert: ./pki/end_certs/openziti_network_components-client.chain.pem

After that it'll initialize and you should be good to go. Hope that helps

1 Like

Got it, thank you very much.

I compared it with the configuration in the quickstart, and indeed it also uses the client.chain.pem certificate chain.

I reviewed your fork

added openziti_network_components-client.chain.pem

cat ./pki/end_certs/openziti_network_components_client.cert \
    ./pki/cas/openziti_network_components_ica.cert \
    ./pki/cas/openziti_ica.cert \
    ./pki/root_ca.cert > ./pki/end_certs/openziti_network_components-client.chain.pem

and replaced the single certificate openziti_network_components_client.cert in the controller configuration with the certificate chain openziti_network_components-client.chain.pem.