Ziti identies are failing after upgrade of edge tunneler

After system upgrade, my ziti identities are failing like this

Need to fix this ASAP. Need some help from team

ziti-edge-tunnel  version
v1.2.1
 ziti-edge-tunnel.service - Ziti Edge Tunnel
     Loaded: loaded (/lib/systemd/system/ziti-edge-tunnel.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2024-10-13 00:32:17 UTC; 2min 23s ago
    Process: 3561 ExecStartPre=/opt/openziti/bin/ziti-edge-tunnel.sh (code=exited, status=0/SUCCESS)
   Main PID: 3562 (ziti-edge-tunne)
      Tasks: 6 (limit: 9130)
     Memory: 5.5M
        CPU: 28ms
     CGroup: /system.slice/ziti-edge-tunnel.service
             └─3562 /opt/openziti/bin/ziti-edge-tunnel run --verbose=2 --dns-ip-range=100.64.0.1/10 --identity-dir=/opt/openziti/etc/identities

Oct 13 00:32:17 machine1 systemd[1]: Starting Ziti Edge Tunnel...
Oct 13 00:32:17 machine1 ziti-edge-tunnel.sh[3561]: NOTICE: no new JWT files in /opt/openziti/etc/identities/*.jwt
Oct 13 00:32:17 machine1 systemd[1]: Started Ziti Edge Tunnel.
Oct 13 00:32:17 machine1 ziti-edge-tunnel[3562]: About to run tunnel service... ziti-edge-tunnel
Oct 13 00:32:17 machine1 ziti-edge-tunnel[3562]: (3562)[        0.016]    WARN ziti-sdk:model_support.c:202 model_parse() json parse error: expected comment
Oct 13 00:32:17 machine1 ziti-edge-tunnel[3562]: (3562)[        0.016]    WARN tunnel-cbs:ziti_tunnel_ctrl.c:982 on_ziti_event() ziti_ctx controller connections failed: ziti context is disabled
Oct 13 00:32:17 machine1 ziti-edge-tunnel[3562]: (3562)[        0.016]   ERROR ziti-edge-tunnel:ziti-edge-tunnel.c:1235 on_event() ztx[/opt/openziti/etc/identities/machine1.json] failed to connect>

~
~
~
~
~
~

i think i found out an issue,
my ziti-controller is still 0.34
when i upgrade ziti-edge tunneler to latest it is throwing error, it works till 1.1.5 when i upgrade to 1.2.0 it throws the same error,
does it not work for older ziti-controller?

That's good to know. Thanks for diagnosing that it seems to have been fine through 1.1.5. I believe we expect newer tunnelers work with older controllers. We do our best to make sure old controllers work with new tunnelers and new controllers work with old tunnelers but sometimes when upgrades happen, changes and assumptions change and it's hard to test everything.

Are there any logs in the controller or in the router that might narrow down the issue? Looking at the changelogs, Releases · openziti/ziti · GitHub 1.1.6 does state that from that release forward the trustDomain needs to be configured. I recall the way this works with newer versions of ziti requires a change to the PKI. I don't know who will be around to help out until probably Tuesday but if I can find someone to comment further, I will.

In the meantime, it seems like staying with 1.1.5 is prudent.

I think i upgraded my controller to latest now, and tried with v1.2.0 of edge tunneler still it is failing. but it works with 1.1.5

so what should be specified in trustDomain?
for example my controller domain is
controller.xxx.ai
then my trustDomain would be xxx.ai?

will it fail if i dont specify that?

I believe the controller won't start. The controller tries to look at your PKI and if the server is configured with a chain, it will walk the chain looking for the root CA and use/inspect that cert for the trustDomain (i'm not an expert on this, it's relatively new stuff).

Adding the trustDomain specifically, I believe is a workaround for situations where the PKI is incomplete.

Per the release notes:

The configuration field which takes a string that must be URI hostname compatible (see: spiffe/standards/SPIFFE-ID.md at main · spiffe/spiffe · GitHub).
If this value is not defined, a trust domain will be generated from the root CA certificate of the controller

So yes, something like: https://xxx.ai.

ok the trustDomain = xxx.ai
should work?

As I stated, I'm not an expert with this setting as it's pretty new. I don't know all there is to know about it. I would expect it to work.

Ok let me try it and see if that works

Sorry that still not fixed my issue, do i need to regenerate the identty?

so when i upgrade ziti-edge tunneler with 1.2.0 i get below error, it works with 1.1.5
my ziti-controller is 1.1.9 upgraded ziti-controller

failed to connect to controller due to ziti context is disabled

I don't have enough information, nor expertise with this particular error to help you. We'll need other people to take a look at this to see if there's anything that sticks out.

I don't suppose you have steps to reproduce the problem we can use to experience what you are? If not, we'll just have to take an older 0.34 controller with an old identity and upgrade it to see if we can replicate the issue. It's a long weekend in the US so I wouldn't expect any of that to happen until at least Tuesday, just to set expectations.

Can you provide the controller logs, router logs and tunneler logs when "the problem" manifests?

ok my current workaround is i have to downgrade edge tunneler to 1.1.5 and its working when i upgrade to 1.2.0 in edge tunneler logs

Oct 13 02:31:41 machine-2 systemd[1]: Starting Ziti Edge Tunnel...
Oct 13 02:31:41 machine-2 ziti-edge-tunnel.sh[33210]: NOTICE: no new JWT files in /opt/openziti/etc/identities/*.jwt
Oct 13 02:31:41 machine-2 systemd[1]: Started Ziti Edge Tunnel.
Oct 13 02:31:41 machine-2 ziti-edge-tunnel[33211]: About to run tunnel service... ziti-edge-tunnel
Oct 13 02:31:41 machine-2 ziti-edge-tunnel[33211]: (33211)[        0.010]    WARN ziti-sdk:model_support.c:202 model_parse() json parse error: expected comment
Oct 13 02:31:41 machine2 ziti-edge-tunnel[33211]: (33211)[        0.011]    WARN tunnel-cbs:ziti_tunnel_ctrl.c:982 on_ziti_event() ziti_ctx controller connections failed: ziti context is disabled
Oct 13 02:31:41 machine-2 ziti-edge-tunnel[33211]: (33211)[        0.011]   ERROR ziti-edge-tunnel:ziti-edge-tunnel.c:1235 on_event() ztx[/opt/openziti/etc/identities/machine-2.json] failed to connect to controller due to ziti context is disabled>

~
i dont see any logs on controller i think it is not reachable, so it is failing? Not sure how to check other logs.
~

I suspect a bug is making identities inactive (locally disabled). Let's assess the status of the identities you've added to the tunneler's identity dir.

First, please install the latest stable release: 1.2.2 and proceed with troubleshooting if the problem persists.

# report the running status of all identities
ziti-edge-tunnel tunnel_status | jq '.Data.Identities[]|{Identifier: .Identifier, Active: .Active}'
# report the saved status of all identities
jq '.Identities[]|{Identifier: .Identifier, Active: .Active}' /var/lib/ziti/config.json

If necessary, enable an identity's identifier that is shown as Active: false.

# set an identifier "Active: true"
ziti-edge-tunnel on_off_identity --identity /opt/openziti/etc/identities/machine1.json --onoff true
1 Like

Yeah this trick worked, but do i need to do it manually everytime?

ziti-edge-tunnel.service - Ziti Edge Tunnel
Loaded: loaded (/lib/systemd/system/ziti-edge-tunnel.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2024-10-16 02:33:04 UTC; 5s ago
Process: 291636 ExecStartPre=/opt/openziti/bin/ziti-edge-tunnel.sh (code=exited, status=0/SUCCESS)
Main PID: 291637 (ziti-edge-tunne)
Tasks: 6 (limit: 9130)
Memory: 6.6M
CPU: 67ms
CGroup: /system.slice/ziti-edge-tunnel.service
└─291637 /opt/openziti/bin/ziti-edge-tunnel run --verbose=2 --dns-ip-range=100.64.0.1/10 --identity-dir=/opt/openziti/etc/identities

Oct 16 02:33:04 machine2 systemd[1]: Starting Ziti Edge Tunnel...
Oct 16 02:33:04 machine2 ziti-edge-tunnel.sh[291636]: NOTICE: no new JWT files in /opt/openziti/etc/identities/*.jwt
Oct 16 02:33:04 machine2 systemd[1]: Started Ziti Edge Tunnel.
Oct 16 02:33:04 machine2 ziti-edge-tunnel[291637]: About to run tunnel service... ziti-edge-tunnel
Oct 16 02:33:04 machine2 ziti-edge-tunnel[291637]: (291637)[ 0.019] WARN ziti-sdk:model_support.c:202 model_parse() json parse error: expected comment

i get this warning
WARN ziti-sdk:model_support.c:202 model_parse() json parse error: expected comment

No. At some point in the past your identity was marked inactive. With 1.2+, the tunneler is now respecting the setting in the config file. It's unknown at this time how or when your identity would have been set to active=false but you shouldn't ever have this issue again unless you set it to false by running the same command again.

This warning relates to loading the config file as a file, as opposed to a string. At this time, this is entirely expected. First the c sdk tries to load the string as an identity, when that fails (this warning) the c sdk tries to load the identity as a file.

1 Like

Thanks, ok then let me check and change the status to true and then will upgrade so that i can connect to instance after restart.