SSO with Ziti Desktop Fails – TLS Handshake Issue and Certificate Configuration

Hi,

I'm using Controller v1.7.2 and the Ziti Desktop Client for macOS v2.52.

I'm trying to connect my identity using the "with URL" feature and SSO, but unfortunately, the attempt fails with the following error: ‘TLS Handshake Failed: Remote Error “tls: bad certificate”.

SSO appears to be functional: I can access ZAC, and the ziti ops verify command returns a login success.

To address the certificate issue, I created an Amazon-issued certificate from a trusted authority and placed it in the alt_server_certs section under web. However, when accessing ZAC in the browser, it still presents the default self-signed certificate (NetFoundry’s). If I make the Amazon certificate primary and the self-signed one alternate, the browser correctly displays the Amazon certificate, but then the routers begin to return errors and are unable to connect to the controller (mismatch with /.well-known signingCert CA store)

I’ve seen in forum discussions that using alt_server_certs is generally recommended. However, after reading the documentation in depth and discussing with GPT, wouldn’t it be better to separate the API bindings (and their associated certificates)? For example:

edge:
  api:
    address: edge.my-domain.come:443  # must match a web.bindPoints.address
  enrollment:
    signingCert:
      cert: pki/intermediate/certs/intermediate.cert
      key:  pki/intermediate/keys/intermediate.key
web:
  # Public ZAC (Admin UI) on a different name (Amazon cert)
  - name: management-public. 
    bindPoints:
      - interface: 0.0.0.0:443
        address: ctrl.my-domain.com:443     # public FQDN for management
    identity:
      server_cert: "/ziti-cert/fullchain.pem"
      key:         "/ziti-cert/privkey.pem"
    apis:
      - binding: edge-management      # based on Important note on listener on the documentation. 
        options: {}
      - binding: zac
        options: { location: /ziti-console, indexFile: index.html }
      - binding: fabric
        options: {}        

  # Public Edge API for tunnelers/SDKs (Amazon cert)
  - name: edge-public
    bindPoints:
      - interface: 0.0.0.0:443
        address: edge.my-domain.com:443  # public FQDN for clients
    identity:
      server_cert: "/ziti-cert/edge_fullchain.pem"   # Amazon/public
      key:         "/ziti-cert/edge_privkey.pem"
    apis:
      - binding: edge-client 
        options: {}    

      - binding: edge-oidc    
        options: {}               

  # Internal Edge API using Ziti PKI 
  - name: edge-internal
    bindPoints:
      - interface: 0.0.0.0:8441
        address: ziti-controller:8441
    identity:
      ca:          "pki/root/certs/root.cert"
      server_cert: "pki/intermediate/certs/server.chain.pem"
      key:         "pki/intermediate/keys/server.key"
      cert:        "pki/intermediate/certs/client.chain.pem"
    apis:
      - binding: edge-client
        options: {}

I think it provides a better flexability.
My questions are:

  1. Where can I find more information about available apis, and their options settings?

  2. Is it possible to bind the same API (e.g., edge-client) to multiple listeners?

  3. Can I place the management-client, ZAC and fabric APIs in a private network to reduce the attack surface? Are they required to be accessible to routers, SDKs, clients, etc.?

  4. What is the purpose of the identity section at the top level of the configuration file? Is it just the default TLS identity unless overridden by another section (such as in web)?

Thanks in advance for your help.

Thanks.

Add By Url

As to what your problem is with add by url, I would bet that it's one of two things. Either you have a cert from a third party (lets encrypt) that is overlapping the FQDN of your private PKI that was established when you created your overlay -- or -- you are using the private PKI as your URL when "adding by url".

You cannot overlap the SANS from the sets of certificates. If you do, you will end up in a situation where the runtime cannot figure out which cert is meant to be used and so it'll just pick one. That will show up as randomly working one time, or randomly failing another. This is an exceptionally common problem that many people hit. I really want to change the controller's code to detect this and log a huge warning or panic or something to tell people why the config is "possibly a problem" because it's such a common problem.

As for add by url, you must use a url that is already trusted by the OS. Doesn't matter if you add your CA to your OS trust store, or use one that's already trusted but it must be trusted. When you go to it in a browser, you can't get the scary "Your connection is not private" type of error or you can't use add by url.

The Questions:

I don't think we have any extra info about the api "options". I don't think there's much there for you to explore, but if I'm wrong maybe someone will correct me.

Yep. It's been demonstrated numerous times in the forums and on YouTube to (if you can't find them lemme know and I'll dig em up). They are usually called something like "splitting the APIs". You just need to make a second api binding and ensure it's on a different interface/port and it'll be fine. I routinely do this, separating the edge-oidc and edge-client apis and keeping them public and moving fabric, edge-management and zac to a private ip (often 127.0.0.1). (and soon you'll be able to bind these services to a ziti service if you want)

Yes, but I think this was answered in the previous answer?

"Yes" but this is a little complicated... The top block is used as the default block. It's the identity configuration that the overlay network will use for itself. It's important to understand that the "overlay network" is not the same as "the apis". Meaning, "the overlay" itself is controller <--> router and router <--> router communication. As opposed to the listening servers the controller operates.

If no other "identity" section appears in the config file that's the block that's used for everything. Other sections of the config allow you to add additional (optional) identity configuration for the edge api bindings if you want. That's why you'll find a separate section under the web bindings.

There's another easy to miss PKI section that covers the PKI that identities are signed from as well. You'll find that under the edge->signingCert section.

Hi @TheLumberjack

Thanks for your reply; it was very helpful.

Unfortunately, it is still not working for me, and I keep getting:
tls: failed to verify certificate: x509: certificate signed by unknown authority.

I believe I followed all the steps correctly and ensured the use of different SANs. However, I noticed that the issue only occurs during enrollment (pre enrollment works perfectly for identities - not so much for routers). When using the Ziti CLI (v1.6.9 – latest), I get the same error unless I provide the --CA flag with Amazon’s root CA.

Failed:

ziti edge enroll --jwt ~/Downloads/test.jwt --out ~/Downloads/Test_enrolled.json --verbose
DEBUG   jwt to parse: eyJhb<jwt long token>cjxa_fPj4gNBhzOkJh4l5Ju
INFO    generating 4096 bit RSA key
DEBUG   fetching certificates from server
Usage:
ziti edge enroll path/to/jwt [flags]

Flags:
--ca string         Additional trusted certificates
-c, --cert string       The certificate to present when establishing a connection.
-h, --help              help for enroll
-n, --idname string     Names the identity. Ignored if not 3rd party auto enrollment
-j, --jwt string        Enrollment token (JWT file). Required
-k, --key string        The key to use with the certificate. Optionally specify the engine to use. supported engines: [parsec]
-a, --keyAlg RSA|EC     Crypto algorithm to use when generating private key (default RSA)
-o, --out string        Output configuration file.
-p, --password string   Password for updb enrollment, prompted if not provided and necessary
--rm                Remove the JWT on success
-u, --username string   Username for updb enrollment, prompted if not provided and necessary
-v, --verbose           Enable verbose logging

failed to enroll: Post "https://my-dmin:443/edge/client/v1/enroll?method=ott&token=fc40426e-267d-4b4f-b8a5-3892f90d7759": tls: failed to verify certificate: x509: certificate signed by unknown authority

Successfully:
ziti edge enroll --jwt ~/Downloads/Test.jwt --out ~/Downloads/Test_enrolled.json --ca ~/Downloads/Amazon_Root_CA_4.pem --verbose
DEBUG jwt to parse: eyJhbGciOiJF<jwt long token>yMx2JT2LK0yH
INFO generating 4096 bit RSA key
DEBUG adding certificates from the provided ca override file
DEBUG fetching certificates from server
INFO enrolled successfully. identity file written to: ~/Downloads/Test_enrolled.json

Could it be related to the crypto library used by the Ziti CLI not trusting Amazon?

By the way, it crashes MacOS desktop client :sweat_smile:

Update: I tried with Let's Encrypt and got the same results, so it's unlikely to be related to Amazon. Could it be a bundle-CA issue?

I tested both Ziti’s generated PKI with alt_server_cert and my own PKI (created using ziti pki commands) with alt_server_cert, and the results were the same.

I don’t think the match matters (since it didn’t work even with the bootstrap-generated PKI), but here are the commands I used to create the PKI on my second attempt:

ziti pki create ca --trust-domain my-domain.com --pki-root ./pki --ca-file root --ca-name 'my domain ziti root'

ziti pki create intermediate --pki-root ./pki --ca-name 'my domain ziti root' --intermediate-file ctrl1-usw1 --intermediate-name 'Signing CA for ctrl1-usw1'

ziti pki create server --pki-root ./pki --ca-name ctrl1-usw1 --dns "localhost,ctrl1-usw1-dev" --ip "127.0.0.1,::1" --server-name ctrl1-usw1 --spiffe-id 'controller/ctrl1-usw1'

ziti pki create client --pki-root ./pki --ca-name ctrl1-usw1 --client-name ctrl1-usw1 --spiffe-id 'controller/ctrl1-usw1'

I'm also sharing the controller configuration - could you take a look and point out any errors, please?

v: 3

db: "/ziti-controller/bbolt.db"

identity:
# Ziti PKI identity (used by ctrl:8441 and by any internal listeners)
  ca:          "pki/root/certs/root.cert"
  cert:        "pki/ctrl1-usw1/certs/client.chain.pem"
  server_cert: "pki/ctrl1-usw1/certs/server.chain.pem"
  key:         "pki/ctrl1-usw1/keys/server.key"


# Routers use this control-plane listener after enrollment
ctrl:
  options:
  # MUST be a DNS name present in the Ziti ctrl server cert SANs
  # the endpoint that routers will connect(router->controller)
    advertiseAddress: tls:ctrl1-usw1-dev:8441 
  listener: tls:0.0.0.0:8441

healthChecks:
  boltCheck:
    interval: 30s
    timeout: 20s
    initialDelay: 30s

edge:
  api:
  # the endpoint that clients connect (client->controller)
  # MUST match a web.bindPoints.address below that serves edge-client
  address: ctrl1-usw1.my-domain.com:443 # Amazon's public FQDN
  sessionTimeout: 30m
enrollment:
  signingCert:
    cert: pki/ctrl1-usw1/certs/ctrl1-usw1.cert  # must be issued by the CA in identity.ca (root)
    key:  pki/ctrl1-usw1/keys/ctrl1-usw1.key
  edgeIdentity:
    duration: 180m
  edgeRouter:
    duration: 180m

web:
  # Public HTTPS listener for Edge API (clients/tunnelers) and ZAC
  # Endpoint that admins manage controller (admins->controller)
  - name: public-mgmt
  bindPoints:
    - interface: 0.0.0.0:8443
      address: ctrl1-usw1.my-domain.com:8443 # Amazon's public FQDN

# Override identity here to use Amazon (public) certificate
  identity:
    ca:          "pki/root/certs/root.cert"            
    server_cert: "pki/ctrl1-usw1/certs/server.chain.pem"   
    key:         "pki/ctrl1-usw1/keys/server.key" 
    cert:        "pki/ctrl1-usw1/certs/client.chain.pem"  

    alt_server_certs:
      - server_cert: "/ziti-pub-certs/fullchain.pem" # Amazon's public FQDN
        server_key: "/ziti-pub-certs/privkey.pem" # Amazon's public FQDN
  options:
    idleTimeout: 5000ms
    readTimeout: 5000ms
    writeTimeout: 100000ms
    minTLSVersion: TLS1.2
    maxTLSVersion: TLS1.3

  apis:
    - binding: edge-management 
      options: {} 
    - binding: fabric
      options: {}     
    - binding: zac               
      options:
        location: /ziti-console
        indexFile: index.html

# Clients authenticates to openziti controllers
  - name: edge-public
    bindPoints:
    - interface: 0.0.0.0:443
      address: ctrl1-usw1.my-domain.com:443 # public FQDN for clients

  identity:
    ca:          "pki/root/certs/root.cert"
    server_cert: "pki/ctrl1-usw1/certs/server.chain.pem"   
    key:         "pki/ctrl1-usw1/keys/server.key"
    cert:        "pki/ctrl1-usw1/certs/client.chain.pem"

    alt_server_certs:
      - server_cert: "/ziti-pub-certs/fullchain.pem" # Amazon's public FQDN
        server_key: "/ziti-pub-certs/privkey.pem" # Amazon's public FQDN
  apis:
  - binding: edge-client 
    options: {}   
  - binding: edge-oidc    
    options: {}       

@qrkourier I saw in other posts that you're somewhat of a certificate guru, could you take a look? :innocent:

Thanks in advanced

DM me here on Discourse your two actual urls and I'll have a look from my side. We need to focus on a particular issue. The issue I am focused on is "adding an identity by url".

For this to work, the URL supplied must be trusted by the OS as I stated before. LetEncrypt, ZeroSSL, any provider that is already trusted by the machine will work. If you are using any url that is presenting the PKI generated by ziti, unless you added that CA to your OS cert store it is going to fail. Always.

Your config is fine. Personally, I would just make the alt certs change to the topmost identity block (the one around lines 20-25 ish usually)

Talked with @NullZiti privately and @ekoby was able to spot the problem. The network jwt contained urls for the public PKI because the controller is using the public FQDN as the advertised address. While understandable, this isn't a correct configuration. The openziti controller config file must 'advertise' a FQDN that uses the private pki.

I encouraged @NullZiti to start over and use two FQDN. A FQDN for the private PKI which is used in the config files and one for the alt cert FQDN - likely used for the ZAC only. :slight_smile:

Thanks a lot @TheLumberjack!
I can confirm it is working!