Hello
I have 3 Controllers (Version 1.5.4) all of which work independent of each other which I try and do ziti agent cluster add I get an error that states the following:
cluster add failed: id not supplied and unable to retrieve [ tls failed to verify cert : x509: cert signed by unknown authority
This has to do with pki, I created all the certs on one of my ziti vms and tared the pki folder and copied it to the others. I then modified the config files to point to the correct cert. Create the certs I followed the instructions on the ziti website
I've done this process before with minimal headaches but today it doesn't want to work
Note: When I try to use the admin wedpage is get tls: first record does not look like tls so there is definitely something wrong with how I did PKI
So I figured out what is going on and not quite sure how to fix it. When I start the ziti controller through ziti run controller config file works fine
when I start through the systemd serviced file does not work, the reason it does not work is the systemd file is trying to use Intermediate certs which I find strange since that is no where in my config
You're certain that the systemd unit is running the exact same command, right? My guess is that your config file is using relative paths and the CWD (current working dir) is different when run via systemd.
I am certain will get config for you
I did notice I am getting alot of errors
err=[not handaler for requested protocols] handshake failed
the connection is erroring on the connection between two ziti vms
Sorry about the delay below is the systemd file and the config files, as as the exact errors I am seeing. Just to note I can get to the Admin Web consoles of both when running ziti controller run config.yml manually, when doing this manually though I can not get the two controllers to communicate. Selinux and firewalld are both disabled
Systemd:
Description=OpenZiti Controller
After=network-online.target
[Service]
Type=simple
# manage the user and permissions for the service automatically
DynamicUser=yes
# this env file configures the service, including whether or not to perform bootstrapping
EnvironmentFile=/opt/openziti/etc/controller/service.env
# relative to /var/lib
StateDirectory=ziti-controller
WorkingDirectory=/var/lib/ziti-controller
ReadOnlyPaths=/opt/openziti/share/console
ExecStartPre=/opt/openziti/etc/controller/entrypoint.bash check config.yml
ExecStart=/opt/openziti/bin/ziti controller run config.yml ${ZITI_ARGS}
Restart=always
RestartSec=3
LimitNOFILE=65535
UMask=0007
[Install]
WantedBy=multi-user.target
Controller1:
v: 3
cluster:
dataDir: "/var/lib/private/ziti-controller/raft"
identity:
cert: ./pki/ctrl1/certs/server.chain.pem
key: ./pki/ctrl1/keys/server.key
ca: ./pki/ctrl1/certs/ctrl1.chain.pem
ctrl:
listener: tls:0.0.0.0:6262
options:
advertiseAddress: tls:Ziti01.5G.MIL:6262
events:
jsonLogger:
subscriptions:
- type: connect
- type: cluster
handler:
type: file
format: json
path: /tmp/ziti-events.log
edge:
api:
address: Ziti01.5G.MIL:1280
enrollment:
signingCert:
cert: pki/ctrl1/certs/ctrl1.cert
key: pki/ctrl1/keys/ctrl1.key
edgeIdentity:
duration: 5m
edgeRouter:
duration: 5m
web:
- name: all-apis-localhost
bindPoints:
- interface: 0.0.0.0:1280
address: Ziti01.5G.MIL:1280
options:
minTLSVersion: TLS1.2
maxTLSVersion: TLS1.3
apis:
- binding: fabric
- binding: edge-management
- binding: edge-client
- binding: edge-oidc
- binding: zac
options:
location: /opt/openziti/share/console
indexFile: index.html
Error When trying to connect two controllers not via systemd, Controller One :
ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:1280]: {remote=[162.178.0.22:60438] error=[not handler for requested protocols [ziti-ctrl]]} handshake failed
Error When trying to connect two controllers not via systemd, Controller Two:
ERROR ziti/controller/raft/mesh.(*impl).Dial: {address=[tls:Ziti01.5G.MIL:1280] error=[error dialing peer tls:Ziti01.5G.MIL:1280: remote error: tls: internal error]} unable to get or connect raft peer channel
Systemd Error Log:
Starting OpenZiti Controller...
Apr 14 08:03:39 Ziti01 entrypoint.bash[1437913]: realpath: missing operand
Apr 14 08:03:39 Ziti01 entrypoint.bash[1437913]: Try 'realpath --help' for more information.
Apr 14 08:03:39 Ziti01 entrypoint.bash[1437917]: realpath: missing operand
Apr 14 08:03:39 Ziti01 entrypoint.bash[1437917]: Try 'realpath --help' for more information.
Apr 14 08:03:39 Ziti01 entrypoint.bash[1437903]: ERROR: database file '' is not writable
Apr 14 08:03:39 Ziti01 entrypoint.bash[1437903]: Provide a configuration in '/var/lib/private/ziti-controller' or generate with:
Apr 14 08:03:39 Ziti01 entrypoint.bash[1437903]: * Set vars in'/opt/openziti/etc/controller/bootstrap.env'
Apr 14 08:03:39 Ziti01 entrypoint.bash[1437903]: * Run '/opt/openziti/etc/controller/bootstrap.bash'
Apr 14 08:03:39 Ziti01 entrypoint.bash[1437903]: * Run 'systemctl enable --now ziti-controller.service'
Apr 14 08:03:39 Ziti01 entrypoint.bash[1437903]: WARN: set VERBOSE=1 or DEBUG=1 for more output
@TheLumberjack Any ideas why I may be seeing this?
running openssl s_client -connect I am getting key values mismatch
what would you recommend when creating certs for 3 separate VMs
this is what I ran values have been modified for sharing, Once created on the first VM I backed up the whole pki directory and moved it
# Create the trust root, a self-signed CA
ziti pki create ca --trust-domain 5G.MIL --pki-root ./pki --ca-file ca --ca-name 'HA Trust Root'
# Create the controller 1 intermediate/signing cert
ziti pki create intermediate --pki-root ./pki --ca-name ca --intermediate-file ctrl1 --intermediate-name 'Controller One Signing Cert'
# Create the controller 1 server cert
ziti pki create server --pki-root ./pki --ca-name ctrl1 --dns Ziti01.5G.MIL --ip 192.168.0.1 --server-name ctrl1 --spiffe-id 'controller/ctrl1'
# Create the controller 1 server cert
ziti pki create client --pki-root ./pki --ca-name ctrl1 --client-name ctrl1 --spiffe-id 'controller/ctrl1'
# Create the controller 2 intermediate/signing cert
ziti pki create intermediate --pki-root ./pki --ca-name ca --intermediate-file ctrl2 --intermediate-name 'Controller Two Signing Cert'
# Create the controller 2 server cert
ziti pki create server --pki-root ./pki --ca-name ctrl2 --dns Ziti02.5G.MIL --ip 192.168.0.2 --server-name ctrl2 --spiffe-id 'controller/ctrl2'
# Create the controller 2 client cert
ziti pki create client --pki-root ./pki --ca-name ctrl2 --client-name ctrl2 --spiffe-id 'controller/ctrl2'
# Create the controller 3 intermediate/signing cert
ziti pki create intermediate --pki-root ./pki --ca-name ca --intermediate-file ctrl3 --intermediate-name 'Controller Three Signing Cert'
# Create the controller 3 server cert
ziti pki create server --pki-root ./pki --ca-name ctrl3 --dns Ziti03.5G.MIL --ip 192.168.0.3 --server-name ctrl3 --spiffe-id 'controller/ctrl3'
# Create the controller 3 client cert
ziti pki create client --pki-root ./pki --ca-name ctrl3 --client-name ctrl3 --spiffe-id 'controller/ctrl3'
ZAC doesn't use mTLS, so it makes sense why ZAC would appear to work when the controllers don't...
Doing what you did does seem to me to be the correct type of flow you'd need to follow. How are you running the controller? Are you directly running it or running it from a systemd unit?
I would recommend you just run the ziti cli directly and see what happens. If it still fails, I would double check the config file is correct and referencing the proper paths/pki locations. I see the locations are relative, maybe even make them absolute and very clear?
can you provide the exact command you're running? Are you using the pki from one to connect to another?
I have been running via
ziti controller run config.yml
The exact command I ran was
openssl s_client -connect Ziti01.5G.MIL:6262 --cert "pki/ctrl2/certs/server.chain.pem" --key "pki/ctrl2/keys/server.key" --CAfile "pki/ctrl2/certs/ctrl2.chain.pem"
realized I didn't have ziti01 started after starting I get Verify return code 0 on Ziti02
and on Ziti01 I get error tls bad record mac handshake failed
ON Reboot I get i/o timeouts when running the openssl connect command
error receiving hello from address i/o timeout
could not clear connection deadline
so you get "Verify return code 0" from ziti02 --> ziti01 but from ziti01 to ziti02 it fails using openssl s_client, do i have that correct?
And you're certain the running ziti01 controller is using the paths you use when using openssl? Please forgive me for asking this question repeatedly, but it's literally the only thing that I can think of that might be wrong or incorrect.
I don't know what the last message means wrt reboot. When this problem has happened to me in the past, i've always just had the wrong certs somehow... I'll have to think about this more. I don't know what/where/how it's gone wrong.
Hey @TheLumberjack I pulled in our system admin and he noticed the VM cpu usage jumped to 200 percent and software was crashing. He did some work to the host and now everything seems to be working the controllers have synced up. I honestly have no clue why that happened
I do still have the issue with starting from systemd I will verify the working area today and try to get that working.