Ziti desktop edge + controller/network version compatibility?

Hi! Today I went to use a resource I have on a small self-managed ziti network from my macOS laptop like I have a couple times a week for a couple of years (thanks for helping me set it up, you know who you are!). But, Desktop Edge doesn't show me my services like it usually does. It shows me as "Status: Connected", but with a red icon next to my client name in the left panel, and no services showing, instead of the 4 that usually show.

I think I remember taking the app store update for Ziti Desktop Edge the other day (running 2.44 (522) now), and pretty sure I haven't used it since. But I also updated to macOS 14.7 at roughly the same time.

I know I'm a terrible person for doing this, but... here's the kicker: I haven't updated the controller / router since I installed it using the quickstart non-docker method. A while ago. So it / they're at v0.27.2. Yup :hangs-head-in-shame:.

Any chance there's an incompatibility between that (old) router and desktop edge? This has been rock solid for the single user (me) plus 2 occasional users over the past couple of years.

Why haven't I upgraded? A combination of laziness and lack of confidence in the upgrade process. I took a look around this morning and didn't find "upgrade" instructions for the "quickstart no docker" method. And I don't see instructions on how to make a docker install use the data from a quickstart. And I'm not sure if I have to upgrade to intermediate versions as stepping stones. Got pointers to docs that provide guidance? I'm definitely capable of muddling my way through this, but clear guidance > educated guesswork. So I figured I'd ask around first before upgrading RIGHT NOW to get work done, or finding another temporary fix.

Thanks!

Jason

Hi @woodwardjd, nice to see you back in the forums! :slight_smile: Yes there's a chance. It's something that was just discovered, and I'm not actively working on it myself so I don't know the details. I did see a message indicating an update of the overlay network is likely to fix the issue. I'm not well-versed enough with the ZDEM versions to know if you're affected but I definitely think it's worthwhile to update the overlay.

The upgrade process is pretty straightforward. For the quickstart, it's really easy. You can just stop the controller/router and copy the directory. This has the benfit of capturing the database as well as your config files and your PKI... So I'd recommend you do that "here and there" (or you know, routinely... :slight_smile: )

You can also backup the database itself by running ziti edge db snapshot at any time the controller is running if you don't want to stop the overlay network.

Once backed up, just download a new binary on top of the existing quickstart path. The version will be mismatched, it'll be in the "0.27.2" folder, but it'll be verison 1.1.15 (or whatever) but then you won't need to change the systemd files. OR you can pull a new binary and update the systemd unit files and restart. The overlay should migrate itself and you'll be back running.

Hope that helps,
-Clint

1 Like

Thanks! Will give that a shot first thing tomorrow, and report back.

FYI the latest build in the App Store (2.45) as of minutes ago resolves an issue that would prevent you from hosting services when you’re connecting to an older controller. Your issue sounds a little different thought so I’m not 100% sure that it will get you back in the game, but it’s easy enough to try so I thought I’d mention it.

latest build in the App Store (2.45) as of minutes ago

Good call, thanks, but no dice. On to the router/controller (and then probably console) upgrade!

@TheLumberjack had to do a little more work because 0.27 was before the unification into a single executable in 0.29. While I was in the service editing for that I also changed them to reference .../ziti-bin/current/ziti, where current is a symlink to a directory, say, ziti-v1.1.15. So upgrading will be easy in the future. I will say that was pretty easy ("just replace an executable"), but I didn't have good confidence that was the only thing that was needed.

All that being said... still the same issue :frowning: I even upgraded the web console to poke around (looks very nice!).

Gonna dig into logging, server side, to try to figure out what's going on.

The web console Identity Management -> my identity -> Visualizer shows connection between my identity (macOS ZDE) to the router as "red" "errored" and the popup shows apiSession No routerConnection No.

ZDE shows "connected" with no services. It also shows Controller Version of v0.27.2 still. I've confirmed via ps and inspection of the command line that it's using the v1.1.15 executable (and the new single-command style).

More when I have it!

I tried creating a new Identity. When I clicked "Enroll" in ZDE (after selecting the JWT) it tells me "Unable to enroll woodwardjd-walnut-2.jwt CONTROLLER_UNAVAILABLE".

Controller process is running. No logs indicate it's crashed and been restarted by systemd.

Continuing to dig.

Found some logs (hey, they're in syslog like they should be, unlike most of the applications I deal with on a daily basis which put their logs in some random place).

When ZDE is open and is in "Connecting..." /OR/ "Connected" state, lines like this come up every few seconds, with the client port changing (that remote IP is my ISP's carrier-grade NAT outlet IPv4 address)

Oct 19 13:23:15 my-server ziti[457428]: {"_context":"tls:0.0.0.0:8441","error":"local error: tls: bad record MAC","file":"github.com/openziti/transport/v2@v2.0.146/tls/listener.go:257","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"8.47.100.15:25566","time":"2024-10-19T13:23:15.194Z"}

Going back in the syslog I see these started when I upgraded the router/controller executable.

... BUT ... there were similar ones prior to the router/controller executable upgrade, but AFTER the macOS 14.7 upgrade / ZDE upgrade

Oct 18 12:17:06 my-server ziti-controller[3637757]: {"level":"info","msg":"http: TLS handshake error from 8.47.100.15:25398: local error: tls: bad record MAC","time":"2024-10-18T12:17:06.821Z"}

Prior to that there were periodic "read connection timed out" errors, going back "forever" (or at least a couple of weeks, during which "everything worked fine"), but none of these MAC errors

Oct 17 11:15:19 my-server ziti-router[3637772]: {"_context":"ch{edge}-\u003eu{classic}-\u003ei{gj2B}","file":"github.com/openziti/channel/v2@v2.0.25/impl.go:320","func":"github.com/openziti/channel/v2.(*channelImpl).rxer","level":"error","msg":"rx error (read tcp my.server.ip.address:8442-\u003e8.47.100.15:25598: read: connection timed out)","time":"2024-10-17T11:15:19.191Z"}

wow. seems like forever ago that we did that! :slight_smile:

bad record MAC/tls errors always make me think the PKI has changed in some way. Are you comfortable sharing the external url of the controller? I'll poke at it - or if you'd rather learn I can show you how to diagnose these issues.

First thing I'd do is run ziti ops verify-network.

ziti ops verify-traffic --help
A tool to verify traffic can flow over the overlay properly. You must be authenticated to use this tool.

Usage:
  ziti ops verify-traffic [flags]

Flags:
      --allow-multiple-servers   Whether to allows the same server multiple times.
      --cleanup                  Whether to perform cleanup.
  -h, --help                     help for verify-traffic
      --host string              the controller host
  -m, --mode string              [optional, default 'both'] The mode to perform: server, client, both.
  -p, --password string          password to use for authenticating to the Ziti Edge Controller, if -u is supplied and -p is not, a value will be prompted for
      --port string              the controller port
  -x, --prefix string            [optional] The prefix to apply to generated objects, necessary when not using the 'both' role.
  -u, --username string          username to use for authenticating to the Ziti Edge Controller
      --verbose                  Show additional output.

That will ensure things are working properly and the problem is somewhere outside the overlay. Can you try that, now that you're on a newer ziti build?

You'll see output like this on success:

$ ziti ops verify-network
INFO    All requested checks passed.
$ ziti ops verify-traffic --host localhost --port 8441 --username admin
WARNING no prefix and mode [] is not 'both'. default prefix of 2024-10-19-1529 will be used
INFO    connecting with user admin to https://localhost:8441
INFO    generating P-384 EC key
INFO    generating P-384 EC key
INFO    waiting 10s for terminator for service: 2024-10-19-1529.verify-traffic
INFO    successfully bound service: 2024-10-19-1529.verify-traffic.

INFO    Server is listening for a connection and will exit when one is received.
INFO    new service session                           session token=bf053611-935e-4056-9e9f-8f8a3298ebca
INFO    found terminator for service: 2024-10-19-1529.verify-traffic
INFO    found service named: 2024-10-19-1529.verify-traffic
INFO    Server has accepted a connection and will exit soon.
INFO    successfully dialed service: 2024-10-19-1529.verify-traffic.
INFO    verify-traffic test successfully detected
INFO    Server complete. exiting
INFO    client complete

Great, that ensures the network works properly. You made a new identity and it didn't succeed with the Mac client. That's surprised me. Are you open to trying with the ziti-edge-tunnel binary instead, just to rule that out? Are you open to sending me a jet with access to no services which I could try on my side?

Partner tested w/ macOS 14.6.1 and ZDE 2.44 (522) against current config (1.1.15) yields same behavior (no services) and server-side logs. (using an old identity that hasn't been used in 6 months)

I haven't made any changes to anything ziti-related prior to yesterday (zde) and this morning (server process versions). Certainly nothing related to PKI. Checking timestamps... Yup, just turning over the 1-year-limit certs back in January.

Yes and yes :slight_smile: Gonna need the specific guidance tho. (also check the local community slack for private-ish info :slight_smile:

Download from https://github.com/openziti/ziti-tunnel-sdk-c/releases/download/v1.2.3/ziti-edge-tunnel-Darwin_arm64.zip

Make an identity/jwt and try to issue ziti-exte-tunnel enroll -j -o accordingly

 sudo ./ziti-edge-tunnel enroll -j the-jwt.jwt -i the-identity.json
(44556)[        0.000]    INFO ziti-sdk:utils.c:198 ziti_log_set_level() set log level: root=3/INFO
(44556)[        0.000]    INFO ziti-sdk:utils.c:169 ziti_log_init() Ziti C SDK version 1.1.3 @gf713dc6(HEAD) starting at (2024-10-19T15:49:18.946)
(44556)[        0.000]    INFO ziti-sdk:ziti_enroll.c:91 ziti_enroll() Ziti C SDK version 1.1.3 @gf713dc6(HEAD) starting enrollment at (2024-10-19T15:49:18.947)
(44556)[        0.000]    INFO ziti-sdk:ziti_ctrl.c:593 ziti_ctrl_init() ctrl[(null):] using https://the-server.com:8441
(44556)[        0.000]    INFO ziti-sdk:ziti_ctrl.c:593 ziti_ctrl_init() ctrl[(null):] using https://the-server.com:8441
(44556)[        0.000]    WARN ziti-sdk:ziti_ctrl.c:180 ctrl_resp_cb() ctrl[the-server.com:8441] request failed: -53(software caused connection abort)
(44556)[        0.000]    WARN ziti-sdk:ziti_ctrl.c:319 internal_version_cb() ctrl[the-server.com:8441] CONTROLLER_UNAVAILABLE(software caused connection abort)
(44556)[        0.000]    WARN ziti-sdk:ziti_ctrl.c:180 ctrl_resp_cb() ctrl[the-server.com:8441] request failed: -53(software caused connection abort)
(44556)[        0.000]    INFO ziti-sdk:ziti_ctrl.c:183 ctrl_resp_cb() ctrl[the-server.com:8441] attempting to switch endpoint
(44556)[        0.000]    WARN ziti-sdk:ziti_ctrl.c:566 ctrl_next_ep() ctrl[the-server.com:8441] no controllers are online
(44556)[        0.000]   ERROR ziti-sdk:ziti_enroll.c:263 enroll_cb() failed to enroll with controller: https://the-server.com:8441 CONTROLLER_UNAVAILABLE (software caused connection abort)
(44556)[        0.000]   ERROR ziti-edge-tunnel:ziti-edge-tunnel.c:2251 enroll_cb() enrollment failed: CONTROLLER_UNAVAILABLE(-3)

Thanks for trying that and confirming it. Oh it just occurs to me that you probably will need to modify one more thing about your controller. We x moved to openssl and it requires the controller config to change

See if that resolves the enrollment issue. The ca bundle needs to contain only root cas in these clients. I think that's what's happening.

I'll try to dig up the command that recreates the bundle in a few

I did the referenced ziti-controller run => zit controller run update to systemd config earlier today, controller is running 1.1.15. I don't see anything referenced in there about config of the controller (say, in the yaml), only how to start it.

(also, thanks for the pointer to the upgrade docs. I couldn't for the life of me find them through google or an in-page search (command-f). Turns out it is very findable through the search box in the upper right, and I didn't notice that).

When he gets time I'm sure @TheLumberjack will chime in here with more details, but this ended up being an incompatibility with the certificate bundles, certificate chaining, and contents of the identity files on clients created by enrollment. Backing out to a version ziti-edge-tunnel prior to these changes in the C SDK (v1.1.3 in the macOS case) got me back on my feet until we could re-bundle certificates server-side and re-enroll new identities (to get the right certs into the identities stored on the client). Since Ziti Desktop Edge is only available in the Mac App Store, I couldn't revert to a previous version of that. Much thanks to Clint's help!