TCP conn limit is 512 -- ZET not allowing sufficient conns

Hello, we increased usage of a cluster using a ZET gateway running ziti-edge-tunnel 0.20.22 and have run into a TCP limit.

Logs are:

May 17 16:43:54 zt ziti-edge-tunnel[1818730]: (1818730)[  5874572.692]   ERROR tunnel-sdk:tunnel_tcp.c:76 new_tcp_pcb() tcp_new failed
May 17 16:43:54 zt ziti-edge-tunnel[1818730]: (1818730)[  5874572.692]   ERROR tunnel-sdk:tunnel_tcp.c:384 recv_tcp() failed to allocate tcp pcb - TCP connection limit is 512

I’ll keep looking through docs/code, but at least at a quick glance, it seems like this limit is hard coded and not user configurable?

Thanks,
John

Hi @johnalotoski.

Yeah, it’s definitely hard coded at this time. My understanding is that those connections should be getting reclaimed/reused but if you have “lots” of connections, as you noticed, you’ll get to 512 eventually. Or perhaps you just have a bunch of long-lasting connections? Then, of course, you will get that 512 limit faster… If you feel like sharing which scenario you’re encountering? It’s useful and helpful feedback for us, thanks.

There’s probably 2 short-term options that aren’t ‘great’ but they are possible. You could pull/build a release of your own that has 512 set to “more” but you might just end up hitting the next limit . You could also consider moving to using a ziti-router instead of ziti-edge-tunnel works the same but you’d have to learn how to deploy that instead. If you’re using the ziti-edge-tunnel in a ‘host only’ type mode, it’s basically a drop in but if you’re also using it for intercept functionality, it’s got some differences.

Longer term, we will look to make this allocation dynamic… @scareything, do you have any additional thoughts to share/add?

Hi @TheLumberjack, thanks for the feedback.

Hmm... The scenario is: network A has a ziti-edge-tunnel gateway to machines on network B where there is no single ziti gateway, but rather those machines each have ziti-edge-tunnel installed for "host model" usage.

Machines on Network A <---> Network A ZT Gateway <---> Host A machine with Ziti
                                                 <---> Host B machine with Ziti
                                                 <---> Host C machine with Ziti

Considerations:

  1. Machines on network A and B run a mesh protocol where machines probe each other for liveness/availability mostly through UDP, but also using TCP as a fallback protocol.

  2. Host machines in network B are servers routinely under heavy load

Trigger:

We re-scaled our network A size larger (few dozen more machines) which seems to have triggered hitting this TCP limit today.

Impact:

We noticed this issue immediately after scaling when a monitoring server in network A started reporting failed scrapes to target machines in network B, due to the monitoring server being denied the connection through the ziti network overlay.

Thoughts:

  • At face value, it would appear that more mesh probes between network A and B due to increased network A cluster size were responsible for hitting the limit. However, looking at ziti-edge-tunnel logs on machines in network B, there are only one or two mesh probes coming in per second, and so that doesn't seem to be the sole cause even though it may have been the trigger. TCP activity on network A machines or the gateway itself don't reach 512 connections either.

  • Network B machines are busy servers. While TCP conn load from ziti itself from network A is small on these network B machines, these servers do have constant significant TCP load from external sources, example:

# netstat -an | awk '/tcp/ {print $6}' | sort | uniq -c
    226 CLOSE_WAIT
     41 ESTABLISHED
     17 FIN_WAIT2
     47 LISTEN
   3503 TIME_WAIT
  • However, just with the TIME_WAIT count alone being far above 512, these network B servers in ziti host mode should have been causing the ziti gateway these errors for a long time. Looking further at the networking metrics on the network B machines, though, I can see that there is perfect time correlation between the IPv4 used socket count and these ziti gateway errors starting, where the used IPv4 socket count is now above 512:
# cat /proc/net/sockstat
sockets: used 684
TCP: inuse 294 orphan 0 tw 2414 alloc 308 mem 229
UDP: inuse 96 mem 6
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0
  • In looking at the metrics history for network B machines, we have been just barely below the 512 IPv4 socket use count for weeks, and increasing network A size today pushed us above that threshold number on network B machines.

  • So if I'm interpreting this correctly, it seems that really busy servers won't be able to utilize ziti traffic as the remote end of the ziti tunnel will get TCP limit errors right away due to traffic unrelated to ziti. It seems kind of odd to me that the machine that logs the TCP limit errors appears to not be the machine that is the one actually hitting the TCP limit.

  • Initially, I was thinking maybe I could reduce the OS time parameter for TIME_WAIT to reduce that count, but since it looks like the threshold is tied to socket use count, not sure that will help.

  • For considering use of a ziti-router at the gateway, the ziti router has a higher or no default tcp limit? We are using the gateway for intercept functionality as well, and I think last time I looked at using tproxy router mode, it presented some problems for us in the way it wanted to enforce how we would use DNS on the machine. Let's see if I can find that thread... Yup, here -- not sure if these prior concerns are still valid in the latest version of the router:

The tproxy option for a ziti-router server tests for primary resolver access to DNS and fails if it doesn’t have it. Why does it need this? The client tunneler documentation here describes ziti-tunnel having similar requirement to be the primary resolver but also mentions that:

“… The nameserver will only answer queries for which it is authoritative i.e. OpenZiti Services’ domain names, and so you will also need a secondary, recursive resolver.”

In the case of ziti-tunnel which I haven’t tried, I assume the secondary recursive resolver can be set with the --resolver flag. However, I don’t see any similar option in the ziti-router command line or config options when running in tproxy mode. Did I miss seeing how to set that config somewhere for ziti-router?

  • So, not sure we can readily switch to the router as a tproxy. Probably a custom build with a higher limit will be the easiest, but between nix and nixos peculiarities and not having built the ziti product before (we've been just leveraging the release binaries so far), not sure if we'll run into some rabbit holes there. I suppose worst case we can revert to wireguard for a short time if we need to.

You may be aware of this but I’ll just point out that there are really two TCP connections for each connection that is intercepted and proxied over ziti. The first connection is between the intercepted client application and the intercepting ziti-edge-tunnel. This connection is handled by LWIP and can’t be seen with operating system metrics. The other connection is between the hosting ziti-edge-tunnel and the application server. This connection uses the operating system’s TCP/IP stack and is therefore visible with operating system tools:

app client  <-- TCP -->  `ziti-edge-tunnel`  <-- TLS --> Ziti Router(s) <-- TLS --> `ziti-edge-tunnel`  <-- TCP --> app server

If I’m understanding you correctly, the TCP connection limit that you are concerned with is on the ziti gateway. Furthermore, it sounds like this (single) gateway is intercepting connections from multiple clients, which would make it more likely to experience connection exhaustion.

I might have been concerned about a connection leak in the intercepting ziti-edge-tunnel, but based on what you’re saying about the timing of these occurrences I think you are now simply asking your ziti gateway to carry more than 512 connections at a time.

The connection structures in LWIP are allocated at compile/link time for efficiency, but we have talked about allocating a chunk of memory (that could be a configurable size) to use when initializing the LWIP structures, as @TheLumberjack mentioned. We don’t have that today though. I created a branch from our v0.20.22 tag that increases the tcp connection limit to 1024.

If you have the ability to build this branch could you give it a try? If you aren’t set up for building ziti-edge-tunnel we ca figure out some way to get you a build, perhaps from our github build action

Hi @scareything, thank you!

No, I hadn’t really thought through the double TCP connection for interception, although that makes sense.

Yes, the machine which is throwing TCP conn limit errors is the ziti-edge-tunnel service on the network A gateway.

So based on your comments, it makes sense then that:

  • The gateway machine that is logging the TCP limit in the ZET service is really the machine with the problem.
  • Not readily finding OS metrics to indicate there is usage over 512 TCP conns on the gateway machine doesn’t mean it hasn’t hit the limit; some of those conns aren’t being counted from OS metrics
  • The perfect timing of the network B host machines metrics climbing above 512 sockets in use coinciding with the alerting of TCP limit hit on the gateway machine ZET was coincidental and the TCP limit is not tied to the network B machine.

Feature suggestion: add the current number of consumed TCP conns and any other similar parameters with hard limits to the tunnel_status CLI output since a user can’t obtain this from OS metrics.

Thank you very much for the branch… I was able to pull a CI artifact and try it. After a few minutes, it is also reporting hitting a 1024 TCP limit… Current socket usage and TCP conn info (which may not be helpful based on comments above) is:

# cat /proc/net/sockstat
sockets: used 382
TCP: inuse 39 orphan 0 tw 26 alloc 59 mem 9
UDP: inuse 146 mem 8
UDPLITE: inuse 0
RAW: inuse 1
FRAG: inuse 0 memory 0

# netstat -tn | awk '/tcp/ {print $6}' | sort | uniq -c
     34 ESTABLISHED
     18 SYN_RECV
     25 TIME_WAIT

So maybe we have a misconfig or a leak for some reason, although we haven’t touched the setup or config in a few months.

In looking further at the ziti-router logs (also co-located on the same server with the ZET service which is reporting the TCP limit), I’m seeing alot of the following errors with different circuit ID:

{"_context":"{c/ZyajblpHn|@/Qwbk}\u003cTerminator\u003e","circuitId":"ZyajblpHn","error":"cannot forward payload, no destination for circuit=ZyajblpHn src=Qwbk dst=Nq05","file":"github.com/openziti/fabric@v0.22.24/router/handler_xgress/receive.go:35","func":"github.com/openziti/fabric/router/handler_xgress.(*receiveHandler).HandleXgressReceive","level":"error","msg":"unable to forward payload","origin":1,"seq":1,"time":"2023-05-17T22:46:51.144Z","uuid":"invalid-uuid-size-of-0-bytes"}

Then, every few to several minutes, a large block of code with the following messages, each line with a different circuitId:

{"circuitId":"SW95VBnFi","ctrlId":"$FQDN","file":"github.com/openziti/fabric@v0.22.24/router/forwarder/scanner.go:85","func":"github.com/openziti/fabric/router/forwarder.(*Scanner).scan","idleThreshold":60000000000,"idleTime":140640000000,"level":"warning","msg":"circuit exceeds idle threshold","time":"2023-05-17T22:18:45.459Z"}

Ziti router and controller in this set up are running 0.27.5.

I’m not sure how to debug the above router errors, so any suggestions are welcome. If I restart the ZET service so it is not hitting the connection limits for at least the first few minutes, it seems like all the connections which should be transiting between network A and B are working as far as I can tell, but yet the router log is still emitting the above messages. Restarting services for ZET, router, controller don’t change the behavior.

Putting the connection counts and limits in the metrics events is a great idea! I’ll write up an issue for that.

Wow, I was expecting that doubling the connection capacity would buy you at least a little more time. Can you please share the following?

  • DEBUG (-v3) logs from the gateway ZET spanning the process starting until at least the first “failed to allocate tcp pcb” error
    • please issue a “dump” command to the ZET before capturing the logs (via ziti-edge-tunnel dump)
  • number of clients behind your ZET gateway. How many connections do you estimate e.g. per-minute, and roughly how long do they last?
  • output from ziti cli commands:
    • ziti edge list identities
    • ziti edge list terminators
    • ziti edge list ers

There may be enough complexity with diagnosing this to make a web meeting worthwhile. Let me know if you’d like to go that route and we can set one up for next week or when you’re available.

Thanks,
-Shawn

Thank you Shawn! Logs and info sent. Yes, if the logs don’t point out something obvious a meeting would be much appreciated, thank you for the offer!

Ok, thanks for the logs, @johnalotoski! I’m looking at the debug logs from your 0.20.22 tunneler. The log spans 415 seconds and includes the process starting up as well as the first TCP connection limit errors. Thanks (again) for that! Being able to see the full span really helps.

The first thing I wanted to learn was how many TCP connections are being intercepted, and how long are they surviving? We can see when connections are being intercepted by looking for messages like this one:

DEBUG tunnel-sdk:tunnel_tcp.c:405 recv_tcp() intercepted address[tcp:DST_IP:DST_PORT] client[tcp:SRC_IP:SRC_PORT] service[SERVICE_NAME]

And we can see the connections being closed with these messages:

DEBUG tunnel-sdk:ziti_tunnel.c:435 ziti_tunneler_close() closing connection: client[tcp:SRC_IP:SRC_PORT] service[SERVICE_NAME]

So looking at the number of each of these messages in your log, the math works out perfectly for exhausting the 512-slot connection pool:

% grep "intercepted address\[tcp:" ziti-edge-tunnel-v0.20.22-debug.log | wc -l                       
 530
% grep "closing connection: client\[t" ziti-edge-tunnel-v0.20.22-debug.log | wc -l
  18

Actually there’s a little more to the story than just the “intercepted” and “closing” log messages. Here’s the full happy path sequence as seen at the intercepting tunneler when the server/hosting side of the connection closes first:

event log message
tunneler receives SYN for intercepted address DEBUG tunnel-sdk:tunnel_tcp.c:405 recv_tcp() intercepted address[tcp:##.##.##.##:9999] client[tcp:##.##.##.##:42566] service[######]
tunneler initiates connection over ziti network DEBUG tunnel-cbs:ziti_tunnel_cbs.c:358 ziti_sdk_c_dial() service[######] app_data_json[144]=‘{“connType”:null,“dst_protocol”:“tcp”,“dst_ip”:“##.##.##.##”,“dst_port”:“9999”,“src_protocol”:“tcp”,“src_ip”:“##.##.##.##”,“src_port”:“42566”}’
ziti connection established DEBUG tunnel-sdk:ziti_tunnel.c:221 ziti_tunneler_dial_completed() ziti dial succeeded: client[tcp:##.##.##.##:42566] service[######]
tunneler sends SYN/ACK to client DEBUG tunnel-sdk:tunnel_tcp.c:59 on_accept() on_accept: 0
end-to-end connection established data transfer logs are not seen at DEBUG log level
hosting tunneler closed ziti connection DEBUG tunnel-cbs:ziti_tunnel_cbs.c:126 on_ziti_data() ziti connection sent EOF (ziti_eof=0, tnlr_eof=0)
tunneler sends FIN to client DEBUG tunnel-sdk:ziti_tunnel.c:466 ziti_tunneler_close_write() closing write connection: client[tcp:##.##.##.##:42566] service[######]
DEBUG tunnel-sdk:tunnel_tcp.c:223 tunneler_tcp_close_write() closing write 0x55d44375fda8, state=4(ESTABLISHED) flags=0x100
DEBUG tunnel-sdk:tunnel_tcp.c:229 tunneler_tcp_close_write() closed write 0x55d44375fda8, state=5(FIN_WAIT_1) flags=0x120
client sends FIN/ACK DEBUG tunnel-sdk:ziti_tunnel.c:435 ziti_tunneler_close() closing connection: client[tcp:##.##.##.##:42566] service[######]

Notice the client’s ip:port is the common thread between all of these messages

So after some scratching around I spotted a connection that was not closed (the lack of a “closing connection” message with a client address that was previously intercepted was what tipped me off). It’s actually the first TCP connection that the tunneler intercepted:

[        0.153]   DEBUG tunnel-sdk:tunnel_tcp.c:405 recv_tcp() intercepted address[tcp:##.##.##.##:8301] client[tcp:##.##.##.##:55026] service[######]
[        0.153]   DEBUG tunnel-cbs:ziti_tunnel_cbs.c:358 ziti_sdk_c_dial() service[######] app_data_json[145]='{"connType":null,"dst_protocol":"tcp","dst_ip":"##.##.##.##","dst_port":"8301","src_protocol":"tcp","src_ip":"##.##.##.##","src_port":"55026"}'
...
[        0.228]   DEBUG tunnel-sdk:ziti_tunnel.c:221 ziti_tunneler_dial_completed() ziti dial succeeded: client[tcp:##.##.##.##:55026] service[######]
...
[       10.220]   DEBUG tunnel-cbs:ziti_tunnel_cbs.c:126 on_ziti_data() ziti connection sent EOF (ziti_eof=0, tnlr_eof=0)
[       10.220]   DEBUG tunnel-sdk:ziti_tunnel.c:466 ziti_tunneler_close_write() closing write connection: client[tcp:##.##.##.##:55026] service[######]
[       10.220]   DEBUG tunnel-sdk:tunnel_tcp.c:223 tunneler_tcp_close_write() closing write 0x55d443760668, state=3(SYN_RCVD) flags=0x900
[       10.220]   DEBUG tunnel-sdk:tunnel_tcp.c:229 tunneler_tcp_close_write() closed write 0x55d443760668, state=5(FIN_WAIT_1) flags=0x920

We never see the “closing connection” message because the client never sends a FIN, but it shouldn’t be sending FIN at this point because it’s still waiting for the tunneler to send SYN/ACK. The tunneler only sends SYN/ACK when the ziti connection is established, but that never happens for this connection. Something must have happened at the hosting tunneler that prevented the connection to tcp port 8301 from succeeding. TCP connections to the other ports in your configs seem to me closing successfully, btw.

The logs from that hosting tunneler will be informative for you in getting your service working optimally, but this information is enough to reveal a bug in the intercepting tunneler. It should be sending RST to the client when the ziti connection fails before the TCP connection is established.

I’ll work on a fix for that, hopefully ready by tomorrow.

3 Likes

Awesome! Great info/analysis, thank you!

1 Like

Actually I missed a detail - the ziti connection that I drilled into actually did have an established ziti link:

[ 0.228] DEBUG tunnel-sdk:ziti_tunnel.c:221 ziti_tunneler_dial_completed() ziti dial succeeded: client[tcp:##.##.##.##:55026] service[######]

So the tunneler should have sent SYN/ACK to the client, and the client should have sent ACK in response. Somehow one or the other of those things didn't happen, but it's all the same - the intercepting tunneler needs to cut the TCP connection loose when it gets into this situation.

2 Likes

Ok :+1:.

Do these logs and this issue you’ve identified in ZET look likely to explain the large number (hundreds to thousands) of fabric circuit error messages (see above for one line examples) in the router logs? Or is that a separate issue?

I don’t think the fabric errors are related to the ZET issues. They may be another thing to explore once we sort out the ZET issue which I think has two factors:

  1. Something is preventing the TCP 3-way handshake between ZET and your clients that are connecting to the service on tcp port 8301 from completing. Clients connecting to other ports don’t have this issue. The best I can say right now is either:

    • ZET isn’t sending the SYN/ACK
    • The client isn’t receiving the SYN/ACK for some reason (routing/firewall rules maybe?)
    • The client isn’t sending the final ACK that completes the 3-way handshake
    • ZET isn’t receiving the final ACK from the client

    A packet capture at ZET and also at one of the hosts with a client that connects to tcp :8301 would shed some light. This tcpdump command stops after writing 200 tcp packets that have a src or dst port of 8301 to a file. The packets are also displayed on the terminal as they are captured, so you can see it working (or not) in real time instead of exercising your patience:

     sudo tcpdump -U -c 200 -w - 'tcp port 8301' | tee "tcp8301-from-$(hostname).pcap" | tcpdump -nr -
    
  2. ZET is leaking TCP connections that are not yet established when the associated ziti connection closes. A commit that I pushed to the branch that you previously built should fix this. Can you pull and build this branch again, and drop the binary onto your gateway? While this should allow ZET to survive, I expect your clients will still have trouble connecting to tcp :8301 as described above.

I’ve asked a colleague who’s more familiar with the fabric about the errors in the router logs and at least at first glance he’s not alarmed by them. I’ll bring it up again and mention the frequency. Can you beam the router logs to me using your preferred transport method?

1 Like

I just realized that you had previously grabbed the ziti-edge-tunnel binary from our GitHub action, instead of building it. Sorry for the mixup. You can get the build with the leak fixes here.

1 Like

All good, thank you very much Shawn!

Quick update on testing and investigation on your suggestions:

  1. The updated build does prevent hitting the 1024 TCP limit: I could see many connection terminations being issued in the ZET log and the TCP limit was never hit! Example:
ERROR tunnel-sdk:tunnel_tcp.c:176 on_tcp_client_err() client=tcp:10.16.72.203:39932 err=-13, terminating connection
  1. Debugging further with tcpdump, I tracked part of the problem down: during the cluster scaling which precipitated hitting the ZET tcp limit, new subnets were added, but these were not updated in ZET service allowedAddresses, and so ZET was properly denying those new subnet addressed clients access to the clients on the other side of the gateway. As soon as I updated the service config to allow the new subnets also, 99% of those terminating connection messages stopped with just a few intermittently remaining here and there.
  2. Ziti router logs are still logging cannot forward payload, no destination for circuit continually (a few per second), and ziti fabric list circuits persists with a list output of ~800 lines… But, I can still see at least one issue on our end which may be related… So I’ll try to get the remaining issue I’m aware of on our end resolved first and then follow up on whether the router cannot-forward-payload messages are still persisting. It may be a few days before I can get back to this as I have some other deadlines I’m juggling.

Great news, though, in that the functionality we had lost in hitting the TCP limit in ZET initially has been addressed with your patched version… Thanks for the help on this, much appreciated!

I’ll follow up further soon.
John

2 Likes

Hey John, thanks for the update. If I’m understanding correctly, your hosting tunnelers were rejecting some of the connections because the subnet addresses weren’t whitelisted in the host.v1 configuration. When that happens, I’d expect to see “ziti dial failed” messages in your client tunneler logs, but all of the dial messages indicate the dials “succeeded”. If you have any logs from one of your hosting tunnelers when the broken configuration was effective I’d like to see it. I’ll set up some tests here in the meantime.

Thanks,
-Shawn

Thanks! Sure, tomorrow I should have some time to grab some more logs.

Short update:

  • Over time, we do still run into TCP limit, but now we don’t hit it for about 16 hours instead of ~5 - 10 minutes. So for now, I’m systemctl restarting the service every 12 hours to avoid hitting the limit.
  • We have hit some segfaults on the new version of ZET. It seems this mostly happens when we hit the TCP limit. I can provide a segfault backtrace.
  • The other issue I mentioned I still had wanted to fix on our end appears unrelated to the fabric circuit logging. It’s a cloud related networking issue affecting a few machines, and I likely won’t be able to fix it quickly as it is somewhat dependent on the cloud provider. Another cluster without that issue still shows the same fabric circuit logging behavior.

I’ll reach out tomorrow with some additional logs… Thanks!
John

After diving into logs and a few trial builds, I’ll bring our discussion back into the public eye on the chance that someone else out there may benefit from our adventure.

What we’ve learned so far

Some of the tcp connections that are intercepted by ziti-edge-tunnel are not established (still in the SYN_RCVD state) 10 seconds after ziti-edge-tunnel responded to the client’s SYN with SYN/ACK. BTW ziti-edge-tunnel sends SYN/ACK only when the corresponding ziti connection is established for the service that is being intercepted. Here’s what this looks like from the intercepting ziti-edge-tunnel:

  1. the connection is intercepted (client sends SYN):

    [        0.153]   DEBUG tunnel-sdk:tunnel_tcp.c:405 recv_tcp() intercepted address[tcp:##.##.##.##:8301] client[tcp:##.##.##.##:55026] service[######]
    
  2. ziti-edge-tunnel initiates a connection for the associated ziti service:

    [        0.153]   DEBUG tunnel-cbs:ziti_tunnel_cbs.c:358 ziti_sdk_c_dial() service[######] app_data_json[145]='{"connType":null,"dst_protocol":"tcp","dst_ip":"##.##.##.##","dst_port":"8301","src_protocol":"tcp","src_ip":"##.##.##.##","src_port":"55026"}'
    
  3. ziti-edge-tunnel receives notification that the ziti connection is established:

    [        0.228]   DEBUG tunnel-sdk:ziti_tunnel.c:221 ziti_tunneler_dial_completed() ziti dial succeeded: client[tcp:##.##.##.##:55026] service[######]
    
  4. ziti-edge-tunnel receives EOF from the ziti connection:

    [       10.220]   DEBUG tunnel-cbs:ziti_tunnel_cbs.c:126 on_ziti_data() ziti connection sent EOF (ziti_eof=0, tnlr_eof=0)
    
  5. ziti-edge-tunnel sends a tcp FIN to half-close with the client:

    [       10.220]   DEBUG tunnel-sdk:ziti_tunnel.c:466 ziti_tunneler_close_write() closing write connection: client[tcp:##.##.##.##:55026] service[######]
    [       10.220]   DEBUG tunnel-sdk:tunnel_tcp.c:223 tunneler_tcp_close_write() closing write 0x55d443760668, state=3(SYN_RCVD) flags=0x900
    [       10.220]   DEBUG tunnel-sdk:tunnel_tcp.c:229 tunneler_tcp_close_write() closed write 0x55d443760668, state=5(FIN_WAIT_1) flags=0x920
    

This is incorrect behavior. ziti-edge-tunnel should not send FIN on a connection that isn’t yet established. The client will never send a FIN to fully close the connection, so the connection will remain half-closed indefinitely. This is one source of TCP connection leakage.

We addressed this specific issue by changing ziti-edge-tunnel to forcibly release tcp connections that aren’t established instead of trying to close them gracefully. Now the logs show this when ziti-edge-tunnel tries to close or write to a tcp connection that isn’t established:

DEBUG tunnel-sdk:tunnel_tcp.c:236 tunneler_tcp_close_write() closing connection before handshake complete. sending RST to client

In addition to sending RST (which may or may not be received by the client), ziti-edge-tunnel also abandons its tcp connection with this client.

With this fix in place, I believe we saw a reduction in the rate of tcp connection leaks. But you still see a steadily rising tcp connection count, so something else is going on. We tried enabling TCP keepalives, from ziti-edge-tunnel but that didn’t seem to make a difference.

Where do we go next?

We’ve seen some of your clients getting stuck in SYN_RCVD. We only happen to see this when something happens on the corresponding ziti connection such as data received or EOF. I think this implies that either there is some packet loss between the clients and the intercepting ziti-edge-tunnel, or a bug in ziti-edge-tunnel that sometimes prevents it from sending SYN/ACK to clients and/or receiving the subsequent ACK that would move the connection to the established state.

We’ve addressed the issue for clients that are not yet established, but I wonder if there could be clients that are in established or closing states that are getting stuck as well. I was hoping that TCP keepalives would catch stuck connections that are in more advanced states, but maybe they aren’t working as expected? Or maybe the tcp application (client or server) doesn’t respond with FIN/ACK when FIN is received?

I think we need the following to answer these questions:

  • VERBOSE logs from the intercepting tunneler. I made a new build that periodically logs keepalive activity and also tcp stats and connection details when a dump command is issued via ziti-edge-tunnel dump. It looks something like this:

    $ sudo ziti-edge-tunnel dump
    received response <{"Success":true,"Code":0}>
    

    And then you should see something like this in the logs:

    [     3439.582]   DEBUG ziti-edge-tunnel:ziti-edge-tunnel.c:662 on_cmd_client() Received IPC client connection request, count: 1
    [     3439.582]    INFO ziti-edge-tunnel:ziti-edge-tunnel.c:635 on_cmd() received cmd <{"Command":"ZitiDump","Data":{}}>
    [     3439.582]    INFO tunnel-cbs:ziti_tunnel_ctrl.c:305 process_cmd() ziti dump started 
    [     3439.582]    INFO tunnel-cbs:ziti_tunnel_ctrl.c:160 ziti_dump_to_log() ziti dump to log
    ...
    =================
    TCP Stats:
    TCP handles used:4 max:4 limit:1024
    =================
    TCP Connections:
    TCP local=100.64.10.3/32:443 remote=100.64.10.1/32:57224 state=ESTABLISHED ziti_service=MY_ZITI_SERVICE
    ...
    
  • tcp packet captures from the intercepting ziti-edge-tunnel. We want to see all packets that ziti-edge-tunnel is receiving and sending. This command will grab all packets that move through ziti-edge-tunnel’s tun interface, assuming it was the first and/or only ziti-edge-tunnel instance running on the box:

    sudo tcpdump -i tun0 -w tun0.pcap
    

    To be sure, you can check the tun device names that are currently in use with:

    ip tuntap list
    

    Ideally the capture would be started within a few seconds of starting the intercepting ziti-edge-tunnel.

  • VERBOSE logs from one of the hosting ziti-edge-tunnels. I’m not sure if there’s a good way to choose which hosting ziti-edge-tunnel we inspect. It will definitely need to be one that hosts one of the ziti services that we’ve seen tcp connection leaks with (e.g. “axt--eq****”) to be interesting.

This post was longer than I’d prefer. To recap succinctly, this is what I’m asking you to do:

On the intercepting ziti-edge-tunnel host:

  1. sudo systemctl start ziti-edge-tunnel
  2. sudo ziti-edge-tunnel dump, and check the logs for the dump output mentioned above. Let me know if you don’t see it.
  3. ip tuntap list to get/verify tun device name
  4. sudo tcpdump -i tun0 -w tun0.pcap
  5. sleep 3600; sudo ziti-edge-tunnel dump
  6. grab the logs, and kill the tcpdump

On one of the hosting ziti-edge-tunnel hosts - probably one that hosts the axt--eq****" service, since we’ve seen clients connecting to that service getting stuck in SYN_RCVD:

  1. make sure the log level is set to VERBOSE. check /var/lib/ziti/config.json if it exists, otherwise add -v=5 to the ziti-edge-tunnel command line in your systemd unit or script.
  2. sudo systemctl start ziti-edge-tunnel
  3. sleep 3600
  4. grab the logs

I’m also happy to do this over a zoom, etc. if it helps.

Thanks,
-Shawn

2 Likes

Thanks Shawn!
I should be able to grab these logs/dumps sometime over the next few days, maybe early next week.

Hey John, I found a mistake in my instructions for the dump commands. Boiled down, the dump command writes to the log by default, but when wiring to the log the dump output is truncated to 1024 chars.

So instead of running ziti-edge-tunnel dump as initially requested, please run ziti-edge-tunnel dump -p /tmp (or a directory of your choosing). The running ziti-edge-tunnel will drop a file with the dump contents into the specified directory. Please include this file with the logs and captures.

Thanks,
-Shawn

Sure, will do. Collecting logs now, thanks.