TCP conn limit is 512 -- ZET not allowing sufficient conns

Hello, we increased usage of a cluster using a ZET gateway running ziti-edge-tunnel 0.20.22 and have run into a TCP limit.

Logs are:

May 17 16:43:54 zt ziti-edge-tunnel[1818730]: (1818730)[  5874572.692]   ERROR tunnel-sdk:tunnel_tcp.c:76 new_tcp_pcb() tcp_new failed
May 17 16:43:54 zt ziti-edge-tunnel[1818730]: (1818730)[  5874572.692]   ERROR tunnel-sdk:tunnel_tcp.c:384 recv_tcp() failed to allocate tcp pcb - TCP connection limit is 512

I’ll keep looking through docs/code, but at least at a quick glance, it seems like this limit is hard coded and not user configurable?

Thanks,
John

Hi @johnalotoski.

Yeah, it’s definitely hard coded at this time. My understanding is that those connections should be getting reclaimed/reused but if you have “lots” of connections, as you noticed, you’ll get to 512 eventually. Or perhaps you just have a bunch of long-lasting connections? Then, of course, you will get that 512 limit faster… If you feel like sharing which scenario you’re encountering? It’s useful and helpful feedback for us, thanks.

There’s probably 2 short-term options that aren’t ‘great’ but they are possible. You could pull/build a release of your own that has 512 set to “more” but you might just end up hitting the next limit . You could also consider moving to using a ziti-router instead of ziti-edge-tunnel works the same but you’d have to learn how to deploy that instead. If you’re using the ziti-edge-tunnel in a ‘host only’ type mode, it’s basically a drop in but if you’re also using it for intercept functionality, it’s got some differences.

Longer term, we will look to make this allocation dynamic… @scareything, do you have any additional thoughts to share/add?

Hi @TheLumberjack, thanks for the feedback.

Hmm… The scenario is: network A has a ziti-edge-tunnel gateway to machines on network B where there is no single ziti gateway, but rather those machines each have ziti-edge-tunnel installed for “host model” usage.

Machines on Network A <---> Network A ZT Gateway <---> Host A machine with Ziti
                                                 <---> Host B machine with Ziti
                                                 <---> Host C machine with Ziti

Considerations:

  1. Machines on network A and B run a mesh protocol where machines probe each other for liveness/availability mostly through UDP, but also using TCP as a fallback protocol.

  2. Host machines in network B are servers routinely under heavy load

Trigger:

We re-scaled our network A size larger (few dozen more machines) which seems to have triggered hitting this TCP limit today.

Impact:

We noticed this issue immediately after scaling when a monitoring server in network A started reporting failed scrapes to target machines in network B, due to the monitoring server being denied the connection through the ziti network overlay.

Thoughts:

  • At face value, it would appear that more mesh probes between network A and B due to increased network A cluster size were responsible for hitting the limit. However, looking at ziti-edge-tunnel logs on machines in network B, there are only one or two mesh probes coming in per second, and so that doesn’t seem to be the sole cause even though it may have been the trigger. TCP activity on network A machines or the gateway itself don’t reach 512 connections either.

  • Network B machines are busy servers. While TCP conn load from ziti itself from network A is small on these network B machines, these servers do have constant significant TCP load from external sources, example:

# netstat -an | awk '/tcp/ {print $6}' | sort | uniq -c
    226 CLOSE_WAIT
     41 ESTABLISHED
     17 FIN_WAIT2
     47 LISTEN
   3503 TIME_WAIT
  • However, just with the TIME_WAIT count alone being far above 512, these network B servers in ziti host mode should have been causing the ziti gateway these errors for a long time. Looking further at the networking metrics on the network B machines, though, I can see that there is perfect time correlation between the IPv4 used socket count and these ziti gateway errors starting, where the used IPv4 socket count is now above 512:
# cat /proc/net/sockstat
sockets: used 684
TCP: inuse 294 orphan 0 tw 2414 alloc 308 mem 229
UDP: inuse 96 mem 6
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0
  • In looking at the metrics history for network B machines, we have been just barely below the 512 IPv4 socket use count for weeks, and increasing network A size today pushed us above that threshold number on network B machines.

  • So if I’m interpreting this correctly, it seems that really busy servers won’t be able to utilize ziti traffic as the remote end of the ziti tunnel will get TCP limit errors right away due to traffic unrelated to ziti. It seems kind of odd to me that the machine that logs the TCP limit errors appears to not be the machine that is the one actually hitting the TCP limit.

  • Initially, I was thinking maybe I could reduce the OS time parameter for TIME_WAIT to reduce that count, but since it looks like the threshold is tied to socket use count, not sure that will help.

  • For considering use of a ziti-router at the gateway, the ziti router has a higher or no default tcp limit? We are using the gateway for intercept functionality as well, and I think last time I looked at using tproxy router mode, it presented some problems for us in the way it wanted to enforce how we would use DNS on the machine. Let’s see if I can find that thread… Yup, here – not sure if these prior concerns are still valid in the latest version of the router:

The tproxy option for a ziti-router server tests for primary resolver access to DNS and fails if it doesn’t have it. Why does it need this? The client tunneler documentation here describes ziti-tunnel having similar requirement to be the primary resolver but also mentions that:

“… The nameserver will only answer queries for which it is authoritative i.e. OpenZiti Services’ domain names, and so you will also need a secondary, recursive resolver.”

In the case of ziti-tunnel which I haven’t tried, I assume the secondary recursive resolver can be set with the --resolver flag. However, I don’t see any similar option in the ziti-router command line or config options when running in tproxy mode. Did I miss seeing how to set that config somewhere for ziti-router?

  • So, not sure we can readily switch to the router as a tproxy. Probably a custom build with a higher limit will be the easiest, but between nix and nixos peculiarities and not having built the ziti product before (we’ve been just leveraging the release binaries so far), not sure if we’ll run into some rabbit holes there. I suppose worst case we can revert to wireguard for a short time if we need to.

You may be aware of this but I’ll just point out that there are really two TCP connections for each connection that is intercepted and proxied over ziti. The first connection is between the intercepted client application and the intercepting ziti-edge-tunnel. This connection is handled by LWIP and can’t be seen with operating system metrics. The other connection is between the hosting ziti-edge-tunnel and the application server. This connection uses the operating system’s TCP/IP stack and is therefore visible with operating system tools:

app client  <-- TCP -->  `ziti-edge-tunnel`  <-- TLS --> Ziti Router(s) <-- TLS --> `ziti-edge-tunnel`  <-- TCP --> app server

If I’m understanding you correctly, the TCP connection limit that you are concerned with is on the ziti gateway. Furthermore, it sounds like this (single) gateway is intercepting connections from multiple clients, which would make it more likely to experience connection exhaustion.

I might have been concerned about a connection leak in the intercepting ziti-edge-tunnel, but based on what you’re saying about the timing of these occurrences I think you are now simply asking your ziti gateway to carry more than 512 connections at a time.

The connection structures in LWIP are allocated at compile/link time for efficiency, but we have talked about allocating a chunk of memory (that could be a configurable size) to use when initializing the LWIP structures, as @TheLumberjack mentioned. We don’t have that today though. I created a branch from our v0.20.22 tag that increases the tcp connection limit to 1024.

If you have the ability to build this branch could you give it a try? If you aren’t set up for building ziti-edge-tunnel we ca figure out some way to get you a build, perhaps from our github build action

Hi @scareything, thank you!

No, I hadn’t really thought through the double TCP connection for interception, although that makes sense.

Yes, the machine which is throwing TCP conn limit errors is the ziti-edge-tunnel service on the network A gateway.

So based on your comments, it makes sense then that:

  • The gateway machine that is logging the TCP limit in the ZET service is really the machine with the problem.
  • Not readily finding OS metrics to indicate there is usage over 512 TCP conns on the gateway machine doesn’t mean it hasn’t hit the limit; some of those conns aren’t being counted from OS metrics
  • The perfect timing of the network B host machines metrics climbing above 512 sockets in use coinciding with the alerting of TCP limit hit on the gateway machine ZET was coincidental and the TCP limit is not tied to the network B machine.

Feature suggestion: add the current number of consumed TCP conns and any other similar parameters with hard limits to the tunnel_status CLI output since a user can’t obtain this from OS metrics.

Thank you very much for the branch… I was able to pull a CI artifact and try it. After a few minutes, it is also reporting hitting a 1024 TCP limit… Current socket usage and TCP conn info (which may not be helpful based on comments above) is:

# cat /proc/net/sockstat
sockets: used 382
TCP: inuse 39 orphan 0 tw 26 alloc 59 mem 9
UDP: inuse 146 mem 8
UDPLITE: inuse 0
RAW: inuse 1
FRAG: inuse 0 memory 0

# netstat -tn | awk '/tcp/ {print $6}' | sort | uniq -c
     34 ESTABLISHED
     18 SYN_RECV
     25 TIME_WAIT

So maybe we have a misconfig or a leak for some reason, although we haven’t touched the setup or config in a few months.

In looking further at the ziti-router logs (also co-located on the same server with the ZET service which is reporting the TCP limit), I’m seeing alot of the following errors with different circuit ID:

{"_context":"{c/ZyajblpHn|@/Qwbk}\u003cTerminator\u003e","circuitId":"ZyajblpHn","error":"cannot forward payload, no destination for circuit=ZyajblpHn src=Qwbk dst=Nq05","file":"github.com/openziti/fabric@v0.22.24/router/handler_xgress/receive.go:35","func":"github.com/openziti/fabric/router/handler_xgress.(*receiveHandler).HandleXgressReceive","level":"error","msg":"unable to forward payload","origin":1,"seq":1,"time":"2023-05-17T22:46:51.144Z","uuid":"invalid-uuid-size-of-0-bytes"}

Then, every few to several minutes, a large block of code with the following messages, each line with a different circuitId:

{"circuitId":"SW95VBnFi","ctrlId":"$FQDN","file":"github.com/openziti/fabric@v0.22.24/router/forwarder/scanner.go:85","func":"github.com/openziti/fabric/router/forwarder.(*Scanner).scan","idleThreshold":60000000000,"idleTime":140640000000,"level":"warning","msg":"circuit exceeds idle threshold","time":"2023-05-17T22:18:45.459Z"}

Ziti router and controller in this set up are running 0.27.5.

I’m not sure how to debug the above router errors, so any suggestions are welcome. If I restart the ZET service so it is not hitting the connection limits for at least the first few minutes, it seems like all the connections which should be transiting between network A and B are working as far as I can tell, but yet the router log is still emitting the above messages. Restarting services for ZET, router, controller don’t change the behavior.

Putting the connection counts and limits in the metrics events is a great idea! I’ll write up an issue for that.

Wow, I was expecting that doubling the connection capacity would buy you at least a little more time. Can you please share the following?

  • DEBUG (-v3) logs from the gateway ZET spanning the process starting until at least the first “failed to allocate tcp pcb” error
    • please issue a “dump” command to the ZET before capturing the logs (via ziti-edge-tunnel dump)
  • number of clients behind your ZET gateway. How many connections do you estimate e.g. per-minute, and roughly how long do they last?
  • output from ziti cli commands:
    • ziti edge list identities
    • ziti edge list terminators
    • ziti edge list ers

There may be enough complexity with diagnosing this to make a web meeting worthwhile. Let me know if you’d like to go that route and we can set one up for next week or when you’re available.

Thanks,
-Shawn

Thank you Shawn! Logs and info sent. Yes, if the logs don’t point out something obvious a meeting would be much appreciated, thank you for the offer!

Ok, thanks for the logs, @johnalotoski! I’m looking at the debug logs from your 0.20.22 tunneler. The log spans 415 seconds and includes the process starting up as well as the first TCP connection limit errors. Thanks (again) for that! Being able to see the full span really helps.

The first thing I wanted to learn was how many TCP connections are being intercepted, and how long are they surviving? We can see when connections are being intercepted by looking for messages like this one:

DEBUG tunnel-sdk:tunnel_tcp.c:405 recv_tcp() intercepted address[tcp:DST_IP:DST_PORT] client[tcp:SRC_IP:SRC_PORT] service[SERVICE_NAME]

And we can see the connections being closed with these messages:

DEBUG tunnel-sdk:ziti_tunnel.c:435 ziti_tunneler_close() closing connection: client[tcp:SRC_IP:SRC_PORT] service[SERVICE_NAME]

So looking at the number of each of these messages in your log, the math works out perfectly for exhausting the 512-slot connection pool:

% grep "intercepted address\[tcp:" ziti-edge-tunnel-v0.20.22-debug.log | wc -l                       
 530
% grep "closing connection: client\[t" ziti-edge-tunnel-v0.20.22-debug.log | wc -l
  18

Actually there’s a little more to the story than just the “intercepted” and “closing” log messages. Here’s the full happy path sequence as seen at the intercepting tunneler when the server/hosting side of the connection closes first:

event log message
tunneler receives SYN for intercepted address DEBUG tunnel-sdk:tunnel_tcp.c:405 recv_tcp() intercepted address[tcp:##.##.##.##:9999] client[tcp:##.##.##.##:42566] service[######]
tunneler initiates connection over ziti network DEBUG tunnel-cbs:ziti_tunnel_cbs.c:358 ziti_sdk_c_dial() service[######] app_data_json[144]=‘{“connType”:null,“dst_protocol”:“tcp”,“dst_ip”:“##.##.##.##”,“dst_port”:“9999”,“src_protocol”:“tcp”,“src_ip”:“##.##.##.##”,“src_port”:“42566”}’
ziti connection established DEBUG tunnel-sdk:ziti_tunnel.c:221 ziti_tunneler_dial_completed() ziti dial succeeded: client[tcp:##.##.##.##:42566] service[######]
tunneler sends SYN/ACK to client DEBUG tunnel-sdk:tunnel_tcp.c:59 on_accept() on_accept: 0
end-to-end connection established data transfer logs are not seen at DEBUG log level
hosting tunneler closed ziti connection DEBUG tunnel-cbs:ziti_tunnel_cbs.c:126 on_ziti_data() ziti connection sent EOF (ziti_eof=0, tnlr_eof=0)
tunneler sends FIN to client DEBUG tunnel-sdk:ziti_tunnel.c:466 ziti_tunneler_close_write() closing write connection: client[tcp:##.##.##.##:42566] service[######]
DEBUG tunnel-sdk:tunnel_tcp.c:223 tunneler_tcp_close_write() closing write 0x55d44375fda8, state=4(ESTABLISHED) flags=0x100
DEBUG tunnel-sdk:tunnel_tcp.c:229 tunneler_tcp_close_write() closed write 0x55d44375fda8, state=5(FIN_WAIT_1) flags=0x120
client sends FIN/ACK DEBUG tunnel-sdk:ziti_tunnel.c:435 ziti_tunneler_close() closing connection: client[tcp:##.##.##.##:42566] service[######]

Notice the client’s ip:port is the common thread between all of these messages

So after some scratching around I spotted a connection that was not closed (the lack of a “closing connection” message with a client address that was previously intercepted was what tipped me off). It’s actually the first TCP connection that the tunneler intercepted:

[        0.153]   DEBUG tunnel-sdk:tunnel_tcp.c:405 recv_tcp() intercepted address[tcp:##.##.##.##:8301] client[tcp:##.##.##.##:55026] service[######]
[        0.153]   DEBUG tunnel-cbs:ziti_tunnel_cbs.c:358 ziti_sdk_c_dial() service[######] app_data_json[145]='{"connType":null,"dst_protocol":"tcp","dst_ip":"##.##.##.##","dst_port":"8301","src_protocol":"tcp","src_ip":"##.##.##.##","src_port":"55026"}'
...
[        0.228]   DEBUG tunnel-sdk:ziti_tunnel.c:221 ziti_tunneler_dial_completed() ziti dial succeeded: client[tcp:##.##.##.##:55026] service[######]
...
[       10.220]   DEBUG tunnel-cbs:ziti_tunnel_cbs.c:126 on_ziti_data() ziti connection sent EOF (ziti_eof=0, tnlr_eof=0)
[       10.220]   DEBUG tunnel-sdk:ziti_tunnel.c:466 ziti_tunneler_close_write() closing write connection: client[tcp:##.##.##.##:55026] service[######]
[       10.220]   DEBUG tunnel-sdk:tunnel_tcp.c:223 tunneler_tcp_close_write() closing write 0x55d443760668, state=3(SYN_RCVD) flags=0x900
[       10.220]   DEBUG tunnel-sdk:tunnel_tcp.c:229 tunneler_tcp_close_write() closed write 0x55d443760668, state=5(FIN_WAIT_1) flags=0x920

We never see the “closing connection” message because the client never sends a FIN, but it shouldn’t be sending FIN at this point because it’s still waiting for the tunneler to send SYN/ACK. The tunneler only sends SYN/ACK when the ziti connection is established, but that never happens for this connection. Something must have happened at the hosting tunneler that prevented the connection to tcp port 8301 from succeeding. TCP connections to the other ports in your configs seem to me closing successfully, btw.

The logs from that hosting tunneler will be informative for you in getting your service working optimally, but this information is enough to reveal a bug in the intercepting tunneler. It should be sending RST to the client when the ziti connection fails before the TCP connection is established.

I’ll work on a fix for that, hopefully ready by tomorrow.

3 Likes

Awesome! Great info/analysis, thank you!

1 Like

Actually I missed a detail - the ziti connection that I drilled into actually did have an established ziti link:

[ 0.228] DEBUG tunnel-sdk:ziti_tunnel.c:221 ziti_tunneler_dial_completed() ziti dial succeeded: client[tcp:##.##.##.##:55026] service[######]

So the tunneler should have sent SYN/ACK to the client, and the client should have sent ACK in response. Somehow one or the other of those things didn’t happen, but it’s all the same - the intercepting tunneler needs to cut the TCP connection loose when it gets into this situation.

2 Likes

Ok :+1:.

Do these logs and this issue you’ve identified in ZET look likely to explain the large number (hundreds to thousands) of fabric circuit error messages (see above for one line examples) in the router logs? Or is that a separate issue?

I don’t think the fabric errors are related to the ZET issues. They may be another thing to explore once we sort out the ZET issue which I think has two factors:

  1. Something is preventing the TCP 3-way handshake between ZET and your clients that are connecting to the service on tcp port 8301 from completing. Clients connecting to other ports don’t have this issue. The best I can say right now is either:

    • ZET isn’t sending the SYN/ACK
    • The client isn’t receiving the SYN/ACK for some reason (routing/firewall rules maybe?)
    • The client isn’t sending the final ACK that completes the 3-way handshake
    • ZET isn’t receiving the final ACK from the client

    A packet capture at ZET and also at one of the hosts with a client that connects to tcp :8301 would shed some light. This tcpdump command stops after writing 200 tcp packets that have a src or dst port of 8301 to a file. The packets are also displayed on the terminal as they are captured, so you can see it working (or not) in real time instead of exercising your patience:

     sudo tcpdump -U -c 200 -w - 'tcp port 8301' | tee "tcp8301-from-$(hostname).pcap" | tcpdump -nr -
    
  2. ZET is leaking TCP connections that are not yet established when the associated ziti connection closes. A commit that I pushed to the branch that you previously built should fix this. Can you pull and build this branch again, and drop the binary onto your gateway? While this should allow ZET to survive, I expect your clients will still have trouble connecting to tcp :8301 as described above.

I’ve asked a colleague who’s more familiar with the fabric about the errors in the router logs and at least at first glance he’s not alarmed by them. I’ll bring it up again and mention the frequency. Can you beam the router logs to me using your preferred transport method?

1 Like

I just realized that you had previously grabbed the ziti-edge-tunnel binary from our GitHub action, instead of building it. Sorry for the mixup. You can get the build with the leak fixes here.

1 Like

All good, thank you very much Shawn!

Quick update on testing and investigation on your suggestions:

  1. The updated build does prevent hitting the 1024 TCP limit: I could see many connection terminations being issued in the ZET log and the TCP limit was never hit! Example:
ERROR tunnel-sdk:tunnel_tcp.c:176 on_tcp_client_err() client=tcp:10.16.72.203:39932 err=-13, terminating connection
  1. Debugging further with tcpdump, I tracked part of the problem down: during the cluster scaling which precipitated hitting the ZET tcp limit, new subnets were added, but these were not updated in ZET service allowedAddresses, and so ZET was properly denying those new subnet addressed clients access to the clients on the other side of the gateway. As soon as I updated the service config to allow the new subnets also, 99% of those terminating connection messages stopped with just a few intermittently remaining here and there.
  2. Ziti router logs are still logging cannot forward payload, no destination for circuit continually (a few per second), and ziti fabric list circuits persists with a list output of ~800 lines… But, I can still see at least one issue on our end which may be related… So I’ll try to get the remaining issue I’m aware of on our end resolved first and then follow up on whether the router cannot-forward-payload messages are still persisting. It may be a few days before I can get back to this as I have some other deadlines I’m juggling.

Great news, though, in that the functionality we had lost in hitting the TCP limit in ZET initially has been addressed with your patched version… Thanks for the help on this, much appreciated!

I’ll follow up further soon.
John

2 Likes