TCP conn limit is 512 -- ZET not allowing sufficient conns

Hey John,

Thanks again for the logs and captures. I had some time to look at the latest yesterday. I see evidence of connections being stuck in FIN_WAIT_1 (on the gateway side). I was hoping that keepalives would help with this, but for some reason I see no sign of keepalive activity in your logs. For example, I see lots of this when I run the same build that you ran:

tcp_keepalive: sending KEEPALIVE probe to 100.64.10.1
tcp_keepalive: tcp_ticks 34282   pcb->tmr 34221 pcb->keep_cnt_sent 0
tcp_keepalive: seqno 4033391 ackno 2627159563 err 0.

But I don’t see it at all in your logs. I’ll dive into the lwip logic that leads to keepalives being sent today.

I also saw something that could be packet loss at/in ziti-edge-tunnel’s packet reader. Here’s an example where the ziti connection closes first:

[      249.523]   DEBUG tunnel-cbs:ziti_tunnel_cbs.c:126 on_ziti_data() ziti connection sent EOF (ziti_eof=0, tnlr_eof=0)
[      249.523]   DEBUG tunnel-sdk:ziti_tunnel.c:466 ziti_tunneler_close_write() closing write connection: client[tcp:#.#.179.#:58804] service[ZITI_SERVICE]
[      249.523]   DEBUG tunnel-sdk:tunnel_tcp.c:243 tunneler_tcp_close_write() closing write tcp local=#.#.171.#:8301 remote=#.#.179.#:58804 state=4(ESTABLISHED) flags=0x100 service=ZITI_SERVICE
[      249.523]   DEBUG tunnel-sdk:tunnel_tcp.c:255 tunneler_tcp_close_write() closed write tcp local=#.#.171.#:8301 remote=#.#.179.#:58804 state=5(FIN_WAIT_1) flags=0x120 service=ZITI_SERVICE

This sequence of log messages basically means that ziti-edge-tunnel sent FIN to the client and is waiting for an ACK. There is no sign in the logs of that ACK, but the capture shows it (frame 2378):

No.   Time.       Source     Destination  Proto Len Info
2285  236.448286  #.#.179.#  #.#.171.#    TCP   60  58804 → 8301 [SYN] Seq=845976624 Win=62727 Len=0 MSS=8460 SACK_PERM TSval=1766677971 TSecr=0 WS=128
2286  236.479739  #.#.171.#  #.#.179.#    TCP   48  8301 → 58804 [SYN, ACK] Seq=13233 Ack=845976625 Win=65535 Len=0 MSS=32768 WS=16384
2287  236.480851  #.#.179.#  #.#.171.#    TCP   40  58804 → 8301 [ACK] Seq=845976625 Ack=13234 Win=62848 Len=0
2288  236.481697  #.#.179.#  #.#.171.#    TCP   743 [TCP Previous segment not captured] 58804 → 8301 [PSH, ACK] Seq=845985586 Ack=13234 Win=62848 Len=703
2289  236.481712  #.#.171.#  #.#.179.#    TCP   40  [TCP Window Update] 8301 → 58804 [ACK] Seq=13234 Ack=845976625 Win=49152 Len=0
2376  246.479420  #.#.171.#  #.#.179.#    TCP   40  8301 → 58804 [FIN, ACK] Seq=13234 Ack=845976625 Win=49152 Len=0
2377  246.480506  #.#.179.#  #.#.171.#    TCP   40  58804 → 8301 [FIN, ACK] Seq=845986289 Ack=13235 Win=62848 Len=0
2378  246.480550  #.#.171.#  #.#.179.#    TCP   40  [TCP Dup ACK 2286#1] 8301 → 58804 [ACK] Seq=13235 Ack=845976625 Win=49152 Len=0

So tcpdump apparently sees the FIN that should advance the lwip connection out of FIN_WAIT_1.

Unfortunately I didn’t enable the logs that would show the packets that ziti-edge-tunnel receives in this build. Nor did I enable the tcp state machine logs. Both of these are incredibly voluminous, but that may end up being necessary to understand whether the packet simply isn’t being read or if it’s being ignored by lwip for some reason. I’m still exploring options, but wanted to give you a quick update.

1 Like

Thank you very much Shawn! Wow, that’s some deep and challenging digging. I think the slow leak rate is about 1 tcp conn per 2 minutes IIRC, so if we need more logs, maybe we can reduce the time window further. Odd about the keepalives. I don’t think there would be any OS level tuning that would affect or interfere with that, but I’ll think about it a bit more also.

1 Like

Hey John, yeah I agree that we can get what we need from a shorter timespan. I’ll get a more verbose build together this morning for you to run when you can. Stand by.

Ok, the latest build includes these changes:

  • ziti-edge-tunnel will log IP packets as they are received and TCP segments that are processed by LWIP:

    DEBUG tunnel-sdk:netif_shim.c:58 netif_shim_input() received packet TCP[100.64.10.1:49876 -> 100.64.10.3:443] len=40 count=1
    DEBUG tunnel-sdk:tunnel_tcp.c:381 recv_tcp() received segment 100.64.10.1:49876->100.64.10.3:443 flags[ACK]
    

    Between these two messages we should be able to determine if ziti-edge-tunnel’s packet reader is simply not seeing the ACK segments, or if some logic in LWIP is causing the ACKs to be ignored.

  • The content of the ziti-edge-tunnel dump command will now include TCP (and UDP) stats and connections that lwip is currently managing:

    =================
    TCP Stats:
    TCP handles used:3 max:17 limit:4096
    =================
    TCP Connections:
    TCP local=100.64.10.3/32:443 remote=100.64.10.1/32:63976 state=ESTABLISHED ziti_service=ZITI_SERVICE
    TCP local=100.64.10.3/32:443 remote=100.64.10.1/32:50875 state=ESTABLISHED ziti_service=ZITI_SERVICE
    TCP local=100.64.10.3/32:443 remote=100.64.10.1/32:51142 state=ESTABLISHED ziti_service=ZITI_SERVICE
    
    =================
    UDP Stats:
    UDP handles used:0 max:1 limit:512
    =================
    UDP Connections:
    
  • More of the LWIP logging options for TCP have been enabled:

    • TCP_INPUT_DEBUG will log something like this for every segment that LWIP handles:

      TCP header:
      +-------------------------------+
      |    49876      |      443      | (src port, dest port)
      +-------------------------------+
      |           0685113833          | (seq no)
      +-------------------------------+
      |           0000761018          | (ack no)
      +-------------------------------+
      |  5 |   |010000|      8934     | (hdrlen, flags (ACK), win)
      +-------------------------------+
      |    0x2456     |         0     | (chksum, urgp)
      +-------------------------------+
      +-+-+-+-+-+-+-+-+-+-+-+-+-+- tcp_input: flags ACK 
      -+-+-+-+-+-+-+-+-+-+-+-+-+-+
      State: ESTABLISHED
      tcp_receive: pcb->rttest 0 rtseq 759423 ackno 761018
      

      This is very chatty and theoretically conveys the same information as the TCP segment logging above, but then reinforcement shouldn’t hurt and maybe we will be surprised.

    • TCP_OUTPUT_DEBUG will log all segments that ziti-edge-tunnel sends to clients:

      tcp_write(pcb=0x102c0b0b0, data=0x15bc0d670, len=247, apiflags=1)
      tcp_write: queueing 1363205:1363452
      tcp_output_segment: rtseq 1363205
      tcp_output_segment: 1363205:1363452
      tcp_output: sending ACK for 685174762
      
    • TCP_RTO_DEBUG (and TCP_FR_DEBUG) enables logging for retransmission:

      tcp_receive: experienced rtt 0 ticks (0 msec).
      tcp_receive: RTO 3 (1500 milliseconds)
      

Could you please run this build when you have the time, along with tcpdump and ziti-edge-tunnel dump outputs?

Thanks,
-Shawn

1 Like

Yes, thank you, will get to it shortly!

1 Like

Ok, so not so shortly after all :slight_smile:

This issue was observed with a fairly large cluster of 83 machines on Network A and just 3 machines on Network B side. Since last having pulled logs from the ZET gateway on Network A, we've shrunk the cluster on Network A to ~30 machines. Looking at the logs today on the ZET gateway over the past several hours, I'm no longer seeing any TCP leakage anymore and previously we were seeing about 1 tcp conn leaked every one or two minutes continuously until we hit the ZET tcp allocation limit. So it looks like the issue was related to either scale, or something about the machines we removed, or both; I'm not sure which.

In any case, the issue doesn't seem to be occurring now, I'll collect some more data over the next day or two, but if it remains stable, probably no point in further log analysis. We are unlikely to rescale up to the previous size the cluster was at, so perhaps we won't be hitting this issue again.

Thanks very much for all the help and deep diving on this challenging-to-nail-down issue. Apologies for being unable to get back to pulling logs before the issue disappeared; would have been nice to discover root cause.

Thanks for the update! We did get some useful fixes for TCP tail cases from the early days of this thread, so it's all good! FYI I was leaving those fixes in a branch until we completely plugged the leaks that you were seeing. I'll pull those fixes together and get them into an upcoming release soon.

1 Like

Awesome, thanks much!