Howdy folks - I’ve been a programmer and general admin for a long time, including network programming for parts of it, but I’m new to OpenZiti and zero-trust in general, so bear with me if I get some of the terminology slightly wrong (or very wrong).
I have the following set up, all using the Open Source stuff (I’m not using NetFoundry paid anything). I set up the first part (db service tunneling) under the guidance of an OpenZiti expert, then I added in the next part (ssh tunneling) myself.
- macOS Ziti Desktop Edge 2.24 (458) (2 machines)
- 2 device identities, 1 user identity (default admin), 1 router identity
- one edge router, installed on a digitalocean “cpu optimized” 2vcpu 4gb ram machine which runs a few other small services, all “non production” stuff not getting much traffic
- controller v0.27.2 ZAC: 2.5.1 running on same DO machine
- 3 Services: a DB on another machine on the DO private network, SSH on another machine on the DO private network, and SSH on the “localhost” of the edge router machine
- “magic” DNS so I ssh and psql to a ziti-only hostname on my macOS machines
I use all that to connect from my laptop (one of the two device identity machines) to the database on the private network (postgres), and to the two SSH servers (one of which is also available on the public internet, but I go through the tunnel).
Here’s what I’ve run in to: Frequently (but NOT ALWAYS) the SSH sessions will hang. No more text will appear on my screen despite any keystrokes. Control-C does not exit out. These “feel like” where the SSH client would normally, eventually, tell you the server connection was severed and exit, but when going over the Ziti tunnel they just hang forever (or at least 24 hours in one instance). This SEEMS TO happen when “pushing a lot of data” across it (like rsync, or big builds outputting a lot of text). And SOMETIMES I will see it spit a few thousand characters out, pause for a moment, spit out another few thousand, then go back to normal (or hang indefinitely). ONCE I was able to get it to do some “extra spitting out” by sending some keystrokes, almost as if it was waiting for data to be transmitted before it would receive more. But that was just one time :shrug:.
When this happens, I can immediately open up a new ssh-over-ziti and start over what I was doing, which works fine until that one hangs. OR, I can immediately SSH over the public internet (to the one with that open), and everything works fine through that indefinitely, including overnight just sitting there.
During these occurrences, I do not get any “disconnect / reconnect” macOS notification from Ziti Desktop Edge, like I do on very rare occasions like when spectrum (cable internet provider) goes down for a moment.
Regarding the postgres connection, I have not experienced any complaints from myself or the other person who uses the postgres tunnel (she doesn’t use the SSH tunnel).
I have not restarted the DO server, or any of the OpenZiti services running on it, since setting it up a couple weeks ago. I prefer not doing this until explicitly directed to do so as part of a troubleshooting process, because for this stuff it’s not really acceptable to “just restart it once in a while”, in my opinion, at least at first until I find out that’s the only option.
Ring any bells? Where do I start troubleshooting this? I’ve done cursory looks for log files, admittedly not exhaustively, and not come up with anything.
Thanks y’all for your time and help!
Jason