Connection closed errors between Ziti-Edge-Tunnels

Hi!

We've been getting quite a couple of connections with one server that aren't working properly somehow.
I can't seem to reproduce this except for waiting a couple of minutes.
Do you have any ideas what this could be caused by or what I can do to troubleshoot?

Jan 30 10:28:49 el01 ziti-edge-tunnel[2306297]: (2306297)[    51663.277]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:45473 err=-14, terminating connection
Jan 30 10:30:23 el01 ziti-edge-tunnel[2306297]: (2306297)[    51757.113]    WARN ziti-sdk:connect.c:341 connect_timeout() conn[0.19656/Connecting] failed to establish connection to service[zabbix_agent.svc] in 10000ms on ch[1]
Jan 30 10:30:23 el01 ziti-edge-tunnel[2306297]: (2306297)[    51757.113]   ERROR tunnel-cbs:ziti_tunnel_cbs.c:103 on_ziti_connect() ziti dial failed: operation did not complete in time
Jan 30 10:30:43 el01 ziti-edge-tunnel[2306297]: (2306297)[    51777.122]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:37915 err=-14, terminating connection
Jan 30 10:30:43 el01 ziti-edge-tunnel[2306297]: (2306297)[    51777.122]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:45025 err=-14, terminating connection
Jan 30 10:30:48 el01 ziti-edge-tunnel[2306297]: (2306297)[    51782.126]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:45241 err=-14, terminating connection
Jan 30 10:30:56 el01 ziti-edge-tunnel[2306297]: (2306297)[    51790.129]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:52375 err=-14, terminating connection
Jan 30 10:30:56 el01 ziti-edge-tunnel[2306297]: (2306297)[    51790.129]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:38047 err=-14, terminating connection
Jan 30 10:31:09 el01 ziti-edge-tunnel[2306297]: (2306297)[    51803.169]   ERROR ziti-sdk:channel.c:585 latency_timeout() ch[1] no read/write traffic on channel since before latency probe was sent, closing channel
Jan 30 10:31:09 el01 ziti-edge-tunnel[2306297]: (2306297)[    51803.169]    WARN ziti-sdk:conn_bridge.c:305 on_ziti_data() br[0.19630] closing bridge due to error: -20(operation did not complete in time)
Jan 30 10:31:09 el01 ziti-edge-tunnel[2306297]: (2306297)[    51803.169]    WARN ziti-sdk:conn_bridge.c:305 on_ziti_data() br[0.19629] closing bridge due to error: -20(operation did not complete in time)
Jan 30 10:31:09 el01 ziti-edge-tunnel[2306297]: (2306297)[    51803.169]    WARN ziti-sdk:conn_bridge.c:305 on_ziti_data() br[0.19628] closing bridge due to error: -20(operation did not complete in time)
Jan 30 10:31:09 el01 ziti-edge-tunnel[2306297]: (2306297)[    51803.169]    WARN ziti-sdk:conn_bridge.c:305 on_ziti_data() br[0.19627] closing bridge due to error: -20(operation did not complete in time)
Jan 30 10:31:09 el01 ziti-edge-tunnel[2306297]: (2306297)[    51803.169]    WARN ziti-sdk:bind.c:346 on_message() binding failed: -20/operation did not complete in time
Jan 30 10:31:09 el01 ziti-edge-tunnel[2306297]: (2306297)[    51803.169]    WARN ziti-sdk:bind.c:346 on_message() binding failed: -20/operation did not complete in time
Jan 30 10:31:10 el01 ziti-edge-tunnel[2306297]: (2306297)[    51804.005]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:33513 err=-14, terminating connection
Jan 30 10:31:10 el01 ziti-edge-tunnel[2306297]: (2306297)[    51804.005]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:49521 err=-14, terminating connection
Jan 30 10:31:10 el01 ziti-edge-tunnel[2306297]: (2306297)[    51804.005]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:55725 err=-14, terminating connection
Jan 30 10:31:14 el01 ziti-edge-tunnel[2306297]: (2306297)[    51808.166]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:56275 err=-14, terminating connection
Jan 30 10:32:26 el01 ziti-edge-tunnel[2306297]: (2306297)[    51880.113]    WARN ziti-sdk:connect.c:341 connect_timeout() conn[0.19705/Connecting] failed to establish connection to service[zabbix_agent.svc] in 10000ms on ch[1]
Jan 30 10:32:26 el01 ziti-edge-tunnel[2306297]: (2306297)[    51880.113]   ERROR tunnel-cbs:ziti_tunnel_cbs.c:103 on_ziti_connect() ziti dial failed: operation did not complete in time
Jan 30 10:32:30 el01 ziti-edge-tunnel[2306297]: (2306297)[    51884.114]    WARN ziti-sdk:connect.c:341 connect_timeout() conn[0.19706/Connecting] failed to establish connection to service[zabbix_agent.svc] in 10000ms on ch[1]
Jan 30 10:32:30 el01 ziti-edge-tunnel[2306297]: (2306297)[    51884.114]   ERROR tunnel-cbs:ziti_tunnel_cbs.c:103 on_ziti_connect() ziti dial failed: operation did not complete in time
Jan 30 10:32:30 el01 ziti-edge-tunnel[2306297]: (2306297)[    51884.116]    WARN ziti-sdk:connect.c:341 connect_timeout() conn[0.19707/Connecting] failed to establish connection to service[zabbix_agent.svc] in 10000ms on ch[1]
Jan 30 10:32:30 el01 ziti-edge-tunnel[2306297]: (2306297)[    51884.116]   ERROR tunnel-cbs:ziti_tunnel_cbs.c:103 on_ziti_connect() ziti dial failed: operation did not complete in time
Jan 30 10:32:33 el01 ziti-edge-tunnel[2306297]: (2306297)[    51887.115]    WARN ziti-sdk:connect.c:341 connect_timeout() conn[0.19708/Connecting] failed to establish connection to service[zabbix_agent.svc] in 10000ms on ch[1]
Jan 30 10:32:33 el01 ziti-edge-tunnel[2306297]: (2306297)[    51887.115]   ERROR tunnel-cbs:ziti_tunnel_cbs.c:103 on_ziti_connect() ziti dial failed: operation did not complete in time
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.125]   ERROR ziti-sdk:channel.c:465 dispatch_message() ch[1] could not find waiter for reply_to = 29 ct[ED70]
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.125]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED70] conn_id[19705]
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.125]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED72] conn_id[19705]
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.125]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED72] conn_id[19705]
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.125]   ERROR ziti-sdk:channel.c:465 dispatch_message() ch[1] could not find waiter for reply_to = 30 ct[ED70]
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.125]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED70] conn_id[19706]
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.125]   ERROR ziti-sdk:channel.c:465 dispatch_message() ch[1] could not find waiter for reply_to = 31 ct[ED70]
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.125]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED70] conn_id[19707]
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.125]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED72] conn_id[19706]
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.125]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED72] conn_id[19707]
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.125]   ERROR ziti-sdk:channel.c:465 dispatch_message() ch[1] could not find waiter for reply_to = 32 ct[ED70]
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.125]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED70] conn_id[19708]
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.125]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED72] conn_id[19708]
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.125]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED72] conn_id[19706]
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.125]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED72] conn_id[19707]
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.125]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED72] conn_id[19708]
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.127]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:42693 err=-14, terminating connection
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.127]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:50333 err=-14, terminating connection
Jan 30 10:32:34 el01 ziti-edge-tunnel[2306297]: (2306297)[    51888.127]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:48391 err=-14, terminating connection
Jan 30 10:33:00 el01 ziti-edge-tunnel[2306297]: (2306297)[    51914.114]    WARN ziti-sdk:connect.c:341 connect_timeout() conn[0.19717/Connecting] failed to establish connection to service[zabbix_agent.svc] in 10000ms on ch[1]
Jan 30 10:33:00 el01 ziti-edge-tunnel[2306297]: (2306297)[    51914.114]   ERROR tunnel-cbs:ziti_tunnel_cbs.c:103 on_ziti_connect() ziti dial failed: operation did not complete in time
Jan 30 10:33:03 el01 ziti-edge-tunnel[2306297]: (2306297)[    51917.114]    WARN ziti-sdk:connect.c:341 connect_timeout() conn[0.19718/Connecting] failed to establish connection to service[zabbix_agent.svc] in 10000ms on ch[1]
Jan 30 10:33:03 el01 ziti-edge-tunnel[2306297]: (2306297)[    51917.114]   ERROR tunnel-cbs:ziti_tunnel_cbs.c:103 on_ziti_connect() ziti dial failed: operation did not complete in time
Jan 30 10:33:04 el01 ziti-edge-tunnel[2306297]: (2306297)[    51918.124]   ERROR ziti-sdk:channel.c:465 dispatch_message() ch[1] could not find waiter for reply_to = 41 ct[ED70]
Jan 30 10:33:04 el01 ziti-edge-tunnel[2306297]: (2306297)[    51918.124]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED70] conn_id[19717]
Jan 30 10:33:04 el01 ziti-edge-tunnel[2306297]: (2306297)[    51918.124]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED72] conn_id[19717]
Jan 30 10:33:04 el01 ziti-edge-tunnel[2306297]: (2306297)[    51918.124]   ERROR ziti-sdk:channel.c:465 dispatch_message() ch[1] could not find waiter for reply_to = 42 ct[ED70]
Jan 30 10:33:04 el01 ziti-edge-tunnel[2306297]: (2306297)[    51918.124]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED70] conn_id[19718]
Jan 30 10:33:04 el01 ziti-edge-tunnel[2306297]: (2306297)[    51918.124]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED72] conn_id[19718]
Jan 30 10:33:04 el01 ziti-edge-tunnel[2306297]: (2306297)[    51918.124]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED72] conn_id[19717]
Jan 30 10:33:04 el01 ziti-edge-tunnel[2306297]: (2306297)[    51918.124]    WARN ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED72] conn_id[19718]
Jan 30 10:33:04 el01 ziti-edge-tunnel[2306297]: (2306297)[    51918.127]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:38853 err=-14, terminating connection
Jan 30 10:33:04 el01 ziti-edge-tunnel[2306297]: (2306297)[    51918.127]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:54263 err=-14, terminating connection
Jan 30 10:33:04 el01 ziti-edge-tunnel[2306297]: (2306297)[    51918.127]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:59385 err=-14, terminating connection
Jan 30 10:33:23 el01 ziti-edge-tunnel[2306297]: (2306297)[    51937.127]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:42527 err=-14, terminating connection
Jan 30 10:33:51 el01 ziti-edge-tunnel[2306297]: (2306297)[    51965.123]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:50377 err=-14, terminating connection
Jan 30 10:33:51 el01 ziti-edge-tunnel[2306297]: (2306297)[    51965.123]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:38135 err=-14, terminating connection
Jan 30 10:33:56 el01 ziti-edge-tunnel[2306297]: (2306297)[    51970.125]   ERROR tunnel-sdk:tunnel_tcp.c:188 on_tcp_client_err() client=tcp:100.64.0.1:40877 err=-14, terminating connection
Jan 30 10:34:41 el01 ziti-edge-tunnel[2306297]: (2306297)[    52015.167]   ERROR ziti-sdk:channel.c:585 latency_timeout() ch[1] no read/write traffic on channel since before latency probe was sent, closing channel
Jan 30 10:34:41 el01 ziti-edge-tunnel[2306297]: (2306297)[    52015.167]   ERROR ziti-sdk:connect.c:901 connect_reply_cb() conn[0.19728/Accepting] failed to connect [-20/not a directory]
Jan 30 10:34:41 el01 ziti-edge-tunnel[2306297]: (2306297)[    52015.167]   ERROR tunnel-cbs:ziti_hosting.c:196 on_hosted_client_connect_complete() hosted_service[MyCompany_Allow_to_EL01_9200] client[server.mycompany.com] client_src_addr[tcp:100.64.0.1:56808] failed to connect: connection is closed
Jan 30 10:34:41 el01 ziti-edge-tunnel[2306297]: (2306297)[    52015.167]   ERROR ziti-sdk:connect.c:901 connect_reply_cb() conn[0.19727/Accepting] failed to connect [-20/not a directory]
Jan 30 10:34:41 el01 ziti-edge-tunnel[2306297]: (2306297)[    52015.167]   ERROR tunnel-cbs:ziti_hosting.c:196 on_hosted_client_connect_complete() hosted_service[MyCompany_Allow_to_EL01_9200] client[server.mycompany.com] client_src_addr[tcp:100.64.0.1:56814] failed to connect: connection is closed
Jan 30 10:34:41 el01 ziti-edge-tunnel[2306297]: (2306297)[    52015.167]   ERROR ziti-sdk:connect.c:901 connect_reply_cb() conn[0.19726/Accepting] failed to connect [-20/not a directory]
Jan 30 10:34:41 el01 ziti-edge-tunnel[2306297]: (2306297)[    52015.167]   ERROR tunnel-cbs:ziti_hosting.c:196 on_hosted_client_connect_complete() hosted_service[MyCompany_Allow_to_EL01_9200] client[server.mycompany.com] client_src_addr[tcp:100.64.0.1:56822] failed to connect: connection is closed
Jan 30 10:34:41 el01 ziti-edge-tunnel[2306297]: (2306297)[    52015.167]    WARN ziti-sdk:conn_bridge.c:305 on_ziti_data() br[0.19729] closing bridge due to error: -20(operation did not complete in time)

Hello, this is certainly odd. Is this situation intermittent or is it permanent once it starts?

My first hunch was that this host might be having some kind of intermittent connectivity problem. We've also recently encountered an issue that could lead to some of the things you're seeing when an edge router is removed from the network. But before jumping to any conclusions I'll try to unpack the log messages:

  • on_tcp_client_err() client=tcp:100.64.0.1:45473 err=-14, terminating connection

    This means an intercepted tcp client sent RST. It might be because the client couldn't recover its end of the TCP connection. It could also be that the client process terminated without closing cleanly and the kernel sent the RST to let us know it was time to clean up on our end. A packet capture would tell us more. Do you see these errors when things are working normally or is in only when the other errors are happening?

  • connect_timeout() conn[0.19656/Connecting] failed to establish connection to service, and on_ziti_connect() ziti dial failed: operation did not complete in time

    For some reason the connection on the ziti network couldn't be completed in 10 seconds. This usually happens when the address specified in the host.v1 service configuration wasn't responding at the hosting tunneler, but it can also mean the ziti routers that connect this tunneler to the hosting tunneler were unable to transfer the dial request/response. Logs from the involved router(s) and the tunneler that was hosting the "zabbix_agent.svc" service would give us a better view of what's happening.

  • received message without conn_id or for unknown connection

    ziti-edge-tunnel received a ziti message for a connection that has already been closed. This is most likely fallout from the previous error.

  • on_hosted_client_connect_complete() hosted_service[MyCompany_Allow_to_EL01_9200] client[server.mycompany.com] client_src_addr[tcp:100.64.0.1:56808] failed to connect: connection is closed

    In this case the tunneler is hosting the service, and its connection to the ziti router is failing. The "-20/not a directory" error is misleading. The error number is correct but the string is not (this logging issue has been fixed but not yet released). In this context -20 actually indicates a timeout on the ziti connection, which is consistent with the timeouts that we see when dialing for the intercepted connections.

So to summarize some things that will help get to the bottom of this:

  1. please increase the log verbosity on this ziti-edge-tunnel by passing -v6 on the command line.
  2. logs from router(s) between this ziti-edge-tunnel and the one that's hosting the ""zabbix_agent.svc" service. Also logs from that hosting tunneler.
  3. If the "err=-14, closing connection" errors only happen when the other errors are being logged then a packet capture on the ziti0 tun device will show us more.

Thanks,
-Shawn

1 Like