Certificate signed by unknown authority

I have 2 servers. on 1 server i have a controller, with 2 subdomains(certs generated by letsencrypt). 1 for edge, 1 for controller.

On the 2nd server i have a router that connect to controller. The edge-router enrollment works fine, but it fails to connect to controller:

ziti-edge-router-1  | [  14.365]   ERROR ziti/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {endpoint=[tls:c1.example.com:443] error=[error connecting ctrl (tls: failed to verify certificate: x509: certificate signed by unknown authority)]} unable to connect controller

Here's the checks that i did:

$ openssl s_client e1.example.com:443
subject=CN = e1.example.com
issuer=C = US, O = Let's Encrypt, CN = E5
---
SSL handshake has read 2435 bytes and written 407 bytes
Verification: OK
---
$ openssl s_client c1.example.com:443
subject=CN = c1.example.com
issuer=C = US, O = Let's Encrypt, CN = E6
---
SSL handshake has read 2431 bytes and written 407 bytes
Verification: OK
---
$ openssl s_client r1.example.com:443
subject=C = US, ST = NC, L = Charlotte, O = NetFoundry, OU = Ziti, CN = s7.jsfRanv
issuer=C = US, L = Charlotte, O = NetFoundry, OU = ADV-DEV, CN = ziti-signing-intermediate
---
SSL handshake has read 2354 bytes and written 407 bytes
Verification error: unable to verify the first certificate
---

Thanks!

What you'll want to do is use the identity section of the router and the key/cert/ca in there to verify your router. So for example if you head your router file:

head ~/.ziti/quickstart/ip-172-31-47-200/ip-172-31-47-200-edge-router.yaml
v: 3

identity:
  cert:             "/home/ubuntu/.ziti/quickstart/ip-172-31-47-200/ip-172-31-47-200-edge-router.cert"
  server_cert:      "/home/ubuntu/.ziti/quickstart/ip-172-31-47-200/ip-172-31-47-200-edge-router.server.chain.cert"
  key:              "/home/ubuntu/.ziti/quickstart/ip-172-31-47-200/ip-172-31-47-200-edge-router.key"
  ca:               "/home/ubuntu/.ziti/quickstart/ip-172-31-47-200/ip-172-31-47-200-edge-router.cas"
  #alt_server_certs:
  #  - server_cert:  ""
  #    server_key:   ""

then use openssl:

openssl s_client -connect ip-172-31-47-200:8440 \
  -key /home/ubuntu/.ziti/quickstart/ip-172-31-47-200/ip-172-31-47-200-edge-router.key \
  -cert /home/ubuntu/.ziti/quickstart/ip-172-31-47-200/ip-172-31-47-200-edge-router.cert \
  -CAfile /home/ubuntu/.ziti/quickstart/ip-172-31-47-200/ip-172-31-47-200-edge-router.cas

You need to see a Verify return code of 0:

    Verify return code: 0 (ok)
    Extended master secret: no
    Max Early Data: 0

Also the endpoint in the router config file is what you'd use in that opensll command. example:

grep endpoint -A1 ~/.ziti/quickstart/ip-172-31-47-200/ip-172-31-47-200-edge-router.yaml
  endpoint:             tls:ip-172-31-47-200:8440

Are you just trying to figure out why/where something went wrong? Has this "always worked" and stopped?

I'm not sure what help you're looking for exactly? :slight_smile:

My router could connect to controller before, now it cant because of that error.

Ok. Thanks. Has anything new happened recently? Did the controller/router upgrade? How old is the controller? Could it have had its certs change? Did this happen just today? Are there any log messages that might be helpful? The files listed in that identity block, are they all "old" or did they change? (looking at the timestamp)

This is a fresh re-install(testing deploy script) of both router/ctrl.

ZITI_IMAGE=openziti/quickstart
ZITI_VERSION=1.1.5

Is it your "deploy script" or one that we have provided? It seems that something, somewhere has gone wrong in that deployment. I can't debug what/why/how. It sorta sounds to me like the deploy script is not functioning properly. I would look at the files, check the logs, and test your connectivity using the openssl command I provided.

If it's one that we provided, can you provide the exact set of steps that have happened? What you have described to me is not a problem anyone else has experienced in the past.

the deploy script basically just creates identities and services, thats why i was so confused why the router couldnt connect to controller.

Heres the controller output:

ziti-controller-1                 | [619993.612]   ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:80]: {remote=[172.18.0.2:32860] error=[remote error: tls: bad certificate]} handshake failed

Heres the output:

    Verify return code: 0 (ok)
    Extended master secret: no
    Max Early Data: 0
---
read R BLOCK

It is a copy/paste from openziti quickstart with exception changing some .env vars. I changed these vars to letsencrypt paths:

ZITI_PKI_EDGE_CERT
ZITI_PKI_EDGE_SERVER_CERT
ZITI_PKI_EDGE_KEY
ZITI_PKI_EDGE_CA
ZITI_PKI_CTRL_SERVER_CERT
ZITI_PKI_CTRL_KEY
ZITI_PKI_CTRL_CA
ZITI_PKI_CTRL_CERT

More context:

NOTE: the /live directory is mounted from hosts lets encrypt directory.
Heres my ziti-controller.yaml:

ziti@821fb5ff8f85:/persistent$ cat ./ziti-controller.yaml 
v: 3

#trace:
#  path: "e1.example.com.trace"

#profile:
#  memory:
#    path: ctrl.memprof



db:                     "/persistent/db/ctrl.db"
# uncomment and configure to enable HA
# raft:
#   dataDir:         "/persistent/raft"
#   minClusterSize:  1


identity:
  cert:        "/ziti-keys/live/c1.example.com/cert.pem"
  server_cert: "/ziti-keys/live/c1.example.com/fullchain.pem"
  key:         "/ziti-keys/live/c1.example.com/privkey.pem"
  ca:          "/ziti-keys/live/c1.example.com/chain.pem"
  #alt_server_certs:
  #  - server_cert:  ""
  #    server_key:   ""

# Network Configuration
#
# Configure how the controller will establish and manage the overlay network, and routing operations on top of
# the network.
#
#network:

  # routeTimeoutSeconds controls the number of seconds the controller will wait for a route attempt to succeed.
  #routeTimeoutSeconds:  10

  # createCircuitRetries controls the number of retries that will be attempted to create a path (and terminate it)
  # for new circuits.
  #createCircuitRetries: 2  

  # pendingLinkTimeoutSeconds controls how long we'll wait before creating a new link between routers where
  # there isn't an established link, but a link request has been sent
  #pendingLinkTimeoutSeconds: 10

  # Defines the period that the controller re-evaluates the performance of all of the circuits
  # running on the network.
  #
  #cycleSeconds:         15
  
  # Sets router minimum cost. Defaults to 10
  #minRouterCost: 10

  # Sets how often a new control channel connection can take over for a router with an existing control channel connection
  # Defaults to 1 minute
  #routerConnectChurnLimit: 1m

  # Sets the latency of link when it's first created. Will be overwritten as soon as latency from the link is actually
  # reported from the routers. Defaults to 65 seconds.
  #initialLinkLatency: 65s
  
  #smart:
    #
    # Defines the fractional upper limit of underperforming circuits that are candidates to be re-routed. If 
    # smart routing detects 100 circuits that are underperforming, and `smart.rerouteFraction` is set to `0.02`,
    # then the upper limit of circuits that will be re-routed in this `cycleSeconds` period will be limited to 
    # 2 (2% of 100). 
    #
    #rerouteFraction:    0.02
    # 
    # Defines the hard upper limit of underperforming circuits that are candidates to be re-routed. If smart 
    # routing detects 100 circuits that are underperforming, and `smart.rerouteCap` is set to `1`, and 
    # `smart.rerouteFraction` is set to `0.02`, then the upper limit of circuits that will be re-routed in this 
    # `cycleSeconds` period will be limited to 1.
    #
    #rerouteCap:         4  

# the endpoint that routers will connect to the controller over.
ctrl:
  options:
    advertiseAddress: tls:e1.example.com:80
  # (optional) settings
  # set the maximum number of connect requests that are buffered and waiting to be acknowledged (1 to 5000, default 1)
  #maxQueuedConnects:      1
  # the maximum number of connects that have  begun hello synchronization (1 to 1000, default 16)
  #maxOutstandingConnects: 16
  # the number of milliseconds to wait before a hello synchronization fails and closes the connection (30ms to 60000ms, default: 5000ms)
  #connectTimeoutMs:       5000
  listener:             tls:0.0.0.0:80

#metrics:
#  influxdb:
#    url:                http://localhost:8086
#    database:           ziti

# xctrl_example
#
#example:
#  enabled:              false
#  delay:                5s

healthChecks:
  boltCheck:
    # How often to try entering a bolt read tx. Defaults to 30 seconds
    interval: 30s
    # When to time out the check. Defaults to 20 seconds
    timeout: 20s
    # How long to wait before starting the check. Defaults to 30 seconds
    initialDelay: 30s

# By having an 'edge' section defined, the ziti-controller will attempt to parse the edge configuration. Removing this
# section, commenting out, or altering the name of the section will cause the edge to not run.
edge:
  # This section represents the configuration of the Edge API that is served over HTTPS
  api:
    #(optional, default 90s) Alters how frequently heartbeat and last activity values are persisted
    # activityUpdateInterval: 90s
    #(optional, default 250) The number of API Sessions updated for last activity per transaction
    # activityUpdateBatchSize: 250
    # sessionTimeout - optional, default 30m
    # The number of minutes before an Edge API session will time out. Timeouts are reset by
    # API requests and connections that are maintained to Edge Routers
    sessionTimeout: 30m
    # address - required
    # The default address (host:port) to use for enrollment for the Client API. This value must match one of the addresses
    # defined in this Controller.WebListener.'s bindPoints.
    address: e1.example.com:443
  # This section is used to define option that are used during enrollment of Edge Routers, Ziti Edge Identities.
  enrollment:
    # signingCert - required
    # A Ziti Identity configuration section that specifically makes use of the cert and key fields to define
    # a signing certificate from the PKI that the Ziti environment is using to sign certificates. The signingCert.cert
    # will be added to the /.well-known CA store that is used to bootstrap trust with the Ziti Controller.
    signingCert:
      cert: /persistent/pki/signing.pem
      key:  /persistent/pki/ziti-signing-intermediate/keys/ziti-signing-intermediate.key
    # edgeIdentity - optional
    # A section for identity enrollment specific settings
    edgeIdentity:
      # duration - optional, default 180m
      # The length of time that a Ziti Edge Identity enrollment should remain valid. After
      # this duration, the enrollment will expire and no longer be usable.
      duration: 100800m
    # edgeRouter - Optional
    # A section for edge router enrollment specific settings.
    edgeRouter:
      # duration - optional, default 180m
      # The length of time that a Ziti Edge Router enrollment should remain valid. After
      # this duration, the enrollment will expire and no longer be usable.
      duration: 100800m

# web
# Defines webListeners that will be hosted by the controller. Each webListener can host many APIs and be bound to many
# bind points.
web:
  # name - required
  # Provides a name for this listener, used for logging output. Not required to be unique, but is highly suggested.
  - name: client-management
    # bindPoints - required
    # One or more bind points are required. A bind point specifies an interface (interface:port string) that defines
    # where on the host machine the webListener will listen and the address (host:port) that should be used to
    # publicly address the webListener(i.e. mydomain.com, localhost, 127.0.0.1). This public address may be used for
    # incoming address resolution as well as used in responses in the API.
    bindPoints:
      #interface - required
      # A host:port string on which network interface to listen on. 0.0.0.0 will listen on all interfaces
      - interface: 0.0.0.0:443
        # address - required
        # The public address that external incoming requests will be able to resolve. Used in request processing and
        # response content that requires full host:port/path addresses.
        address: e1.example.com:443
    # identity - optional
    # Allows the webListener to have a specific identity instead of defaulting to the root 'identity' section.
    identity:
      ca:          "/ziti-keys/live/e1.example.com/chain.pem"
      key:         "/ziti-keys/live/e1.example.com/privkey.pem"
      server_cert: "/ziti-keys/live/e1.example.com/fullchain.pem"
      cert:        "/ziti-keys/live/e1.example.com/cert.pem"
      #alt_server_certs:
      #- server_cert: ""
      #  server_key:  ""
      
    # options - optional
    # Allows the specification of webListener level options - mainly dealing with HTTP/TLS settings. These options are
    # used for all http servers started by the current webListener.
    options:
      # idleTimeoutMs - optional, default 5000ms
      # The maximum amount of idle time in milliseconds allowed for pipelined HTTP requests. Setting this too high
      # can cause resources on the host to be consumed as clients remain connected and idle. Lowering this value
      # will cause clients to reconnect on subsequent HTTPs requests.
      idleTimeout: 5000ms  #http timeouts, new
      # readTimeoutMs - optional, default 5000ms
      # The maximum amount of time in milliseconds http servers will wait to read the first incoming requests. A higher
      # value risks consuming resources on the host with clients that are acting bad faith or suffering from high latency
      # or packet loss. A lower value can risk losing connections to high latency/packet loss clients.
      readTimeout: 5000ms
      # writeTimeoutMs - optional, default 100000ms
      # The total maximum time in milliseconds that the http server will wait for a single requests to be received and
      # responded too. A higher value can allow long-running requests to consume resources on the host. A lower value
      # can risk ending requests before the server has a chance to respond.
      writeTimeout: 100000ms
      # minTLSVersion - optional, default TLS1.2
      # The minimum version of TSL to support
      minTLSVersion: TLS1.2
      # maxTLSVersion - optional, default TLS1.3
      # The maximum version of TSL to support
      maxTLSVersion: TLS1.3
    # apis - required
    # Allows one or more APIs to be bound to this webListener
    apis:
      # binding - required
      # Specifies an API to bind to this webListener. Built-in APIs are
      #   - edge-management
      #   - edge-client
      #   - fabric-management
      - binding: edge-management
        # options - arg optional/required
        # This section is used to define values that are specified by the API they are associated with.
        # These settings are per API. The example below is for the 'edge-api' and contains both optional values and
        # required values.
        options: { }
      - binding: edge-client
        options: { }
      - binding: fabric
        options: { }
      #- binding: zac
      #  options:
      #    location: /ziti-console
      #    indexFile: index.html

heres my routers r1.example.com.yaml:

ziti@5ed98e6d33d8:/persistent$ cat r1.example.com.yaml
v: 3

identity:
  cert:             "/persistent/r1.example.com.cert"
  server_cert:      "/persistent/r1.example.com.server.chain.cert"
  key:              "/persistent/r1.example.com.key"
  ca:               "/persistent/r1.example.com.cas"
  #alt_server_certs:
  #  - server_cert:  ""
  #    server_key:   ""

ctrl:
  endpoint:             tls:c1.example.com:443

link:
  dialers:
    - binding: transport
  listeners:
    - binding:          transport
      bind:             tls:0.0.0.0:10080
      advertise:        tls:r1.example.com:10080
      options:
        outQueueSize:   4

listeners:
# bindings of edge and tunnel requires an "edge" section below
  - binding: edge
    address: tls:0.0.0.0:443
    options:
      advertise: r1.example.com:443
      connectTimeoutMs: 5000
      getSessionTimeout: 60
  - binding: tunnel
    options:
      mode: host #tproxy|host


edge:
  csr:
    country: US
    province: NC
    locality: Charlotte
    organization: NetFoundry
    organizationalUnit: Ziti
    sans:
      dns:
        - localhost
        - r1.example.com
        - 5ed98e6d33d8
      ip:
        - "127.0.0.1"


#transport:
#  ws:
#    writeTimeout: 10
#    readTimeout: 5
#    idleTimeout: 120
#    pongTimeout: 60
#    pingInterval: 54
#    handshakeTimeout: 10
#    readBufferSize: 4096
#    writeBufferSize: 4096
#    enableCompression: true
#    server_cert: /persistent/r1.example.com.server.chain.cert
#    key: /persistent/r1.example.com.key
#alt_server_certs:
#  - server_cert:  ""
#    server_key:   ""
forwarder:
  latencyProbeInterval: 0
  xgressDialQueueLength: 1000
  xgressDialWorkerCount: 128
  linkDialQueueLength: 1000
  linkDialWorkerCount: 32

Also a question: In 3 months when my letsencrypt certs will need to be renewed, will all my identities .json and .jwt also expire?

@TheLumberjack Any ideas?

Managing the PKI is a very delicate thing. Looking at what you're doing, I think you are much better off letting OpenZiti manage it's PKI and instead, you should look to using alt_server_certs for the controller (and possibly the ws-enabled routers if you end up using Browzer). It'll be much easier. Then you only need to roll those alt server certs and the PKI is well-known to the overlay itself.

Without debugging what you've done myself, all I can say is what I've said before. You somehow have a mismatch of certificates. I don't know how/where it went wrong, but it has. It would be helpful if you can demonstrate a clear set of steps to reproduce the problem which I could follow, I'd try them out. Without steps to reproduce the issue, I honestly don't know how better to help here nor what else to tell you.

I would still suggest you allow openziti to maintain it's own PKI. You can still have verifiable certificates for the controller API if you desire. The only reasons I can think of for that sort of setup would be if you want some other API to access the controller (and don't want to retrieve/use the .well-known/est/cacerts bundle) or for using the ZAC.

1 Like

Hello,

This resolved the problem.