OpenZiti doesn't work anymore

Couple of days I have try to figure out why my OpenZiti has stopped working. It has been working last 3 months without problems.

So, any hints what to do or test?

I have single controller on VPS and 2 public routers and 2 private ones, all on version 1.4.3.
For test I have rebuild one public router (ozrb1) and re-enrolled my ZDEW.

Now I see following errors on controller when ZDEW try to connect

[11752.160]   ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:443]: {remote=[95.217.xxx.yy:49132] error=[EOF]} handshake failed
[11762.303]   ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:443]: {remote=[95.217.xxx.yy:49188] error=[EOF]} handshake failed
[11772.347]   ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:443]: {remote=[95.217.xxx.yy:49098] error=[EOF]} handshake failed

and on ZDEW logs I see "reason=no controller available, cannot create circuit"

[2025-04-08T12:06:19.175Z]   DEBUG tunnel-cbs:ziti_tunnel_cbs.c:354 ziti_sdk_c_dial() service[odoo2-service] app_data_json[178]='{"connType":null,"dst_protocol":"tcp","dst_hostname":"shop.domain.com","dst_ip":"100.64.0.15","dst_port":"443","src_protocol":"tcp","src_ip":"100.64.0.1","src_port":"54798"}'
[2025-04-08T12:06:19.175Z]   DEBUG ziti-sdk:connect.c:432 connect_get_service_cb() conn[0.370/NsaaeNeJ/Connecting](odoo2-service) got service[odoo2-service] id[5DMAEDTItSDhyJm7cfKeCX]
[2025-04-08T12:06:19.175Z]   DEBUG ziti-sdk:posture.c:216 ziti_send_posture_data() ztx[0] posture checks must_send set to TRUE, new_session_id[FALSE], must_send_every_time[TRUE], new_controller_instance[FALSE]
[2025-04-08T12:06:19.175Z]   DEBUG ziti-sdk:connect.c:553 process_connect() conn[0.370/NsaaeNeJ/Connecting](odoo2-service) starting Dial connection for service[odoo2-service] with session[cm98gfixv0a21p6bp9kt36rv3]
[2025-04-08T12:06:19.175Z]   DEBUG ziti-sdk:connect.c:409 ziti_connect() conn[0.370/NsaaeNeJ/Connecting](odoo2-service) selected ch[ozri1@tls://192.168.110.2:443] for best latency(3 ms)
[2025-04-08T12:06:19.175Z]   DEBUG ziti-sdk:channel.c:245 ziti_channel_add_receiver() ch[2] added receiver[370]
[2025-04-08T12:06:19.179Z]   ERROR ziti-sdk:connect.c:1073 connect_reply_cb() conn[0.369/CnkvdnU3/Connecting](odoo2-service) failed to connect, reason=no controller available, cannot create circuit
[2025-04-08T12:06:19.179Z]   ERROR tunnel-cbs:ziti_tunnel_cbs.c:103 on_ziti_connect() ziti dial failed: connection is closed
[2025-04-08T12:06:19.179Z]   DEBUG ziti-sdk:connect.c:896 flush_to_client() conn[0.369/CnkvdnU3/Closed](odoo2-service) no data_cb: can't flush, 0 bytes 

But when connect from my WLS to controller I cannot found the problem...

timo@TIMO-P14s:~$ ziti ops verify traffic
WARNING no prefix and mode [] is not 'both'. default prefix of 2025-04-08-1520 will be used
Using controller url: https://ozc1.domain.com:8443/edge/management/v1 from identity 'default' in config file: /home/timo/.config/ziti/ziti-cli.json
Using username: admin from identity 'default' in config file: /home/timo/.config/ziti/ziti-cli.json
Enter password:
RESTY 2025/04/08 15:20:34 ERROR Post "https://ozc1.domain.com:8443/edge/management/v1/authenticate?method=password": context deadline exceeded (Client.Timeout exceeded while awaiting headers), Attempt 1
Token: 5400e89b-2deb-43ea-a8e0-b178b482f027
Saving identity 'default' to /home/timo/.config/ziti/ziti-cli.json
INFO    generating P-384 EC key
INFO    generating P-384 EC key
INFO    waiting 10s for terminator for service: 2025-04-08-1520.traffic
INFO    successfully bound service: 2025-04-08-1520.traffic.

INFO    Server is listening for a connection and will exit when one is received.
INFO    new service session                           session token=e2a15177-07f6-47a1-a236-df68073db92e
INFO    found terminator for service: 2025-04-08-1520.traffic
INFO    found service named: 2025-04-08-1520.traffic
INFO    Server has accepted a connection and will exit soon.
INFO    successfully dialed service: 2025-04-08-1520.traffic.
INFO    traffic test successfully detected
INFO    Server complete. exiting
INFO    client complete
timo@TIMO-P14s:~$ ziti ops verify network
INFO    All requested checks passed.

timo@TIMO-P14s:~$ ziti fabric list routers
╭────────────┬───────┬────────┬──────┬──────────────┬──────────┬───────────────────────┬──────────────────────────────────╮
│ ID         │ NAME  │ ONLINE │ COST │ NO TRAVERSAL │ DISABLED │ VERSION               │ LISTENERS                        │
├────────────┼───────┼────────┼──────┼──────────────┼──────────┼───────────────────────┼──────────────────────────────────┤
│ LCL6JKhPby │ ozrb1 │ true   │    0 │ false        │ false    │ v1.4.3 on linux/arm64 │ 1: tls:ozrb1.domain.com:80       │
│ OxLPlvdtE  │ ozrb2 │ true   │    0 │ false        │ false    │ v1.4.3 on linux/amd64 │ 1: tls:ozrb2.domain.com:80       │
│ SwgS1vd1ap │ ozri2 │ true   │    0 │ false        │ false    │ v1.4.3 on linux/amd64 │ 1: tls:192.168.110.3:80          │
│ ePZkDbQx74 │ ozri1 │ true   │    0 │ false        │ false    │ v1.4.3 on linux/amd64 │ 1: tls:192.168.110.2:80          │
╰────────────┴───────┴────────┴──────┴──────────────┴──────────┴───────────────────────┴──────────────────────────────────╯
results: 1-4 of 4
timo@TIMO-P14s:~$ ziti fabric list links
╭────────────────────────┬────────┬──────────┬─────────────┬─────────────┬─────────────┬───────────┬────────┬───────────╮
│ ID                     │ DIALER │ ACCEPTOR │ STATIC COST │ SRC LATENCY │ DST LATENCY │ STATE     │ STATUS │ FULL COST │
├────────────────────────┼────────┼──────────┼─────────────┼─────────────┼─────────────┼───────────┼────────┼───────────┤
│ 2e5BUsoVS3UU3mKWkkesH6 │ ozrb2  │ ozrb1    │           1 │      10.2ms │       9.9ms │ Connected │     up │        20 │
│ 2fCNnQ1U72CQpB8QUWwKBk │ ozri2  │ ozrb1    │           1 │      44.3ms │      38.0ms │ Connected │     up │        83 │
│ 4kJjEabpc2Zli1Hzt0kzde │ ozri2  │ ozri1    │           1 │       2.9ms │       2.9ms │ Connected │     up │         5 │
│ 7AmubYgUWOY2ReorlyhvEO │ ozri1  │ ozrb1    │           1 │       9.0ms │       8.9ms │ Connected │     up │        17 │
│ E7gubU4chTiyu2rczYr6S  │ ozri2  │ ozrb2    │           1 │      44.9ms │      44.6ms │ Connected │     up │        89 │
│ Yg075yxcpYLdKzg84jb7c  │ ozri1  │ ozrb2    │           1 │      14.2ms │      14.2ms │ Connected │     up │        29 │
╰────────────────────────┴────────┴──────────┴─────────────┴─────────────┴─────────────┴───────────┴────────┴───────────╯
results: 1-6 of 6
timo@TIMO-P14s:~$ ziti fabric list circuits
╭───────────┬───────────────────────────┬───────────────┬────────────────────────┬─────────────────────┬─────────╮
│ ID        │ CLIENT                    │ SERVICE       │ TERMINATOR             │ CREATEDAT           │ PATH    │
├───────────┼───────────────────────────┼───────────────┼────────────────────────┼─────────────────────┼─────────┤
│ 4j0JNzGHT │ cm93tu1ki1nhtlvbpr7ye2tu1 │ zabbix-agents │ 1mOTqgmqtl0zeeIAvzVqiX │ 2025-04-08 08:29:58 │ r/ozrb2 │
│ 4n6bNAlrT │ cm93tu1ki1nhtlvbpr7ye2tu1 │ zabbix-agents │ 1mOTqgmqtl0zeeIAvzVqiX │ 2025-04-08 08:31:54 │ r/ozrb2 │
│ 4uc5TAGrN │ cm93tu1ki1nhtlvbpr7ye2tu1 │ zabbix-agents │ 1mOTqgmqtl0zeeIAvzVqiX │ 2025-04-08 08:31:34 │ r/ozrb2 │
│ 63vJTzGHT │ cm93tu1ki1nhtlvbpr7ye2tu1 │ zabbix-agents │ 1mOTqgmqtl0zeeIAvzVqiX │ 2025-04-08 08:29:59 │ r/ozrb2 │
│ 6PsaTAGrN │ cm93tu1ki1nhtlvbpr7ye2tu1 │ zabbix-agents │ 1mOTqgmqtl0zeeIAvzVqiX │ 2025-04-08 08:31:59 │ r/ozrb2 │
│ 8kQvTzGHN │ cm93tu2lo1nhwlvbpuzblxkeh │ wazuh-agents  │ bH3ejxYqycDJGf2EUtTSz  │ 2025-04-08 08:33:27 │ r/ozrb1 │
│ FiGJTzlrN │ cm93tu1ki1nhtlvbpr7ye2tu1 │ zabbix-agents │ 1mOTqgmqtl0zeeIAvzVqiX │ 2025-04-08 08:29:50 │ r/ozrb2 │
│ HBpGEAlHN │ cm93tu2lo1nhwlvbpuzblxkeh │ wazuh-agents  │ 7iEdOXZIH5g1pIkk8yxdgW │ 2025-04-08 11:23:32 │ r/ozrb2 │
│ IZycTzGHN │ cm93tu1ki1nhtlvbpr7ye2tu1 │ zabbix-agents │ 1mOTqgmqtl0zeeIAvzVqiX │ 2025-04-08 08:31:19 │ r/ozrb2 │
│ IjncNAGHT │ cm93tu1ki1nhtlvbpr7ye2tu1 │ zabbix-agents │ 1mOTqgmqtl0zeeIAvzVqiX │ 2025-04-08 08:31:23 │ r/ozrb2 │
╰───────────┴───────────────────────────┴───────────────┴────────────────────────┴─────────────────────┴─────────╯
results: 1-10 of 31
timo@TIMO-P14s:~$ ziti fabric list terminators
╭────────────────────────┬──────────────────────┬────────┬─────────┬────────────────────────┬──────────┬──────┬────────────┬──────────────┬────────────╮
│ ID                     │ SERVICE              │ ROUTER │ BINDING │ ADDRESS                │ INSTANCE │ COST │ PRECEDENCE │ DYNAMIC COST │ HOST ID    │
├────────────────────────┼──────────────────────┼────────┼─────────┼────────────────────────┼──────────┼──────┼────────────┼──────────────┼────────────┤
│ 10q2wjKsW30Ao9ShqUEPM6 │ srv-backrest         │ ozri2  │ edge    │ 10q2wjKsW30Ao9ShqUEPM6 │          │    0 │ default    │            0 │ ClhTL4eeM6 │
│ 1BW5fNEa0A4v5hjRRMzFTS │ ad-services          │ ozrb2  │ edge    │ 1BW5fNEa0A4v5hjRRMzFTS │          │    0 │ default    │            0 │ q2OjYbQx7  │
│ 1MSVIPSX2dPkS4YBUFh2CI │ srv-jump2            │ ozrb2  │ edge    │ 1MSVIPSX2dPkS4YBUFh2CI │          │    0 │ default    │            0 │ YqRkhAFf8R │
│ 1Oal1BfsuSkGtcFbg0ijIJ │ ad-services          │ ozri2  │ edge    │ 1Oal1BfsuSkGtcFbg0ijIJ │          │    0 │ default    │            0 │ q2OjYbQx7  │
│ 1RG5Hhmk7tcpf2XTr0rXrG │ odoo2-service        │ ozrb2  │ edge    │ 1RG5Hhmk7tcpf2XTr0rXrG │          │    0 │ default    │            0 │ xhRbXQIha  │
│ 1Sq6UhOY5WmPlcuLW12YDt │ homeassistant        │ ozrb2  │ edge    │ 1Sq6UhOY5WmPlcuLW12YDt │          │    0 │ default    │            0 │ 9-X7OTd1ap │
│ 1ZCZ2XeGtpjlHsQQoyuRjm │ kalenteri-app        │ ozri2  │ edge    │ 1ZCZ2XeGtpjlHsQQoyuRjm │          │    0 │ default    │            0 │ 6HUAGjHeM  │
│ 1u4ma5em62aupSYohdFUSA │ rproxy-proxy-forward │ ozri2  │ edge    │ 1u4ma5em62aupSYohdFUSA │          │    0 │ default    │            0 │ yN5L7Sd1a  │
│ 21REG7PQd90Magy4K2I4RA │ rproxy-luna2-forward │ ozrb1  │ edge    │ 21REG7PQd90Magy4K2I4RA │          │    0 │ default    │            0 │ HcoL4Kf7s  │
│ 29eN2CdTILBMZRTHY5Uhvp │ rproxy-odoo2-forward │ ozrb2  │ edge    │ 29eN2CdTILBMZRTHY5Uhvp │          │    0 │ default    │            0 │ xhRbXQIha  │
╰────────────────────────┴──────────────────────┴────────┴─────────┴────────────────────────┴──────────┴──────┴────────────┴──────────────┴────────────╯
results: 1-10 of 43
timo@TIMO-P14s:~$

Hi @timnis, sorry you're suddely having a problem but it definitely seems bizzare that it would suddenly stop. The ZDEW did have an update yesterday to the beta and 'latest' streams (not stable). Do you know if you updated it?

Are you willing to send me an identity to test with? I'd only need to enroll it, it doesn't require access to any services. If so, DM me or email clint at openziti.org with the jwt and I can have a peek.

Looks like you ran ziti ops verify traffic from WSL right? That would rule out the MITM type of problem...

One other thing you can try to test to see if it's related to your identity file is a bit of effort but it'll help.

If you use WSL here's the steps. if you see missing pem prefix, open the identity file and add pem: to each of the ca, cert, key fields and start over...

mkdir /tmp/test

# name without .json -- used more than once below
IDENTITY_FILE_NAME="forwarder"
CTRL_ADDRESS="ec2-3-18-113-172.us-east-2.compute.amazonaws.com:8441"
cp "/mnt/c/Windows/System32/config/systemprofile/AppData/Roaming/NetFoundry/${IDENTITY_FILE_NAME}.json" /tmp/test

cd /tmp/test
ziti ops unwrap "/tmp/test/${IDENTITY_FILE_NAME}.json"
chmod 700 /tmp/test/*

openssl s_client \
  -connect $CTRL_ADDRESS \
  -CAfile /tmp/test/$IDENTITY_FILE_NAME.ca \
  -cert /tmp/test/$IDENTITY_FILE_NAME.cert \
  -key /tmp/test/$IDENTITY_FILE_NAME.key 2>&1 </dev/null \
  | grep Verify

This will unwrap your identity and use it with openssl. You should see:

Verify return code: 0 (ok)

Just DM'd identity for test.

Tested this and "pem:" was missing from identity file. Now added and script return ok.

What I have tested, I have same problem with android client and linux ziti-edge-tunnels also.

BTW, if non-HA, should "ziti fabric list controllers" return something?

timo@TIMO-P14s:~$ ziti fabric list controllers
╭────┬──────┬────────────────┬─────────────────────────────────────────────╮
│ ID │ NAME │ LAST CONNECTED │ ISONLINE (ONLY VALID IF CLUSTER HAS LEADER) │
├────┼──────┼────────────────┼─────────────────────────────────────────────┤
╰────┴──────┴────────────────┴─────────────────────────────────────────────╯
results: none

On my laptop latest ZDEW is installed.

Yes, I ran ziti ops verify traffic from WSL

This is all very strange. I was able to enroll the test identity fine. Thanks for sending it over...

From the ZDEW, can you email/dm me a 'feedback' set of logs too? Everything looks fine from my side so far. Also, what version of ZDEW are you running?

You stated you're also having the same problem withlinux/android. Are all of these devices in the same network or are they on many different machines? I can't think of any reason why the ZDEW would report "controller not available" if the unwrap/openssl command returns valid (and the verify traffic succeeds).

This is truly bizzare.

I would think it would make sense for it to return itself. I'll mention it to Paul to see if it's intentional. Thanks for noticing that.

Finally had a time to work this problem...

Today I noticed that my openziti network was working again, all devices. Yesterday didn't have time to do anything.

So I really don't have any clue what was the problem. Few days back it didn't worked at all, then I tried ZDEW, ZET and ZME on multiple networks. Today haven't done anything and all works :man_shrugging:

My VPS running on Hetzner, so maybe there was some problem who knows....

Thanks for following up. This DOES actually sound like a problem where the VPS is rate-limited in some way. It'd explain why I was able to connect as well without issue. When the cloud provider decides the VM has used too many "resources", you will often get some exceptionally difficult issue to track down that you don't normally think about. My guess, is that's what happened here.

I've seen this myself using smaller instances when "doing a lot". Sometimes it's transferring too much data but usually it's consuming too much CPU too quickly. For example, performing a build on a small VPS sometimes trigger the CPU limiter -- then everythign comes to a halt and it looks like nothing is working....

That's my best guess. But, glad it's back working for you. Cheers

Maybe I need start building HA now :slight_smile: