The tunnel device failed to forward the high-bitrate video stream

Dear team,

Our company is testing a setup to connect high-definition cameras (4K resolution, 16,382kbps bitrate) to the Ziti network via an OpenWRT device (aarch64 architecture, 8-core CPU, 16GB RAM, 512GB SSD). The goal is to achieve stable high-bitrate video streaming.
**Current Issue:**​

  1. ​**UDP Mode (Severe Packet Loss):**​
  • Using ziti-edge-tunnel-v1.4.5 as the gateway with UDP transmission.
  • ​**Observed Error Log:**​
WARN tunnel-sdk:tunnel_udp.c:66 to_ziti() ziti_write stalled: dropping UDP packet service=video-gbs-svc, client=udp:172.16.7.151:15060, ret=-7  
  • Result: Frequent packet drops, making UDP unusable.
  1. ​**TCP Mode (Freezing Issue):**​
  • Switched to TCP (active/passive mode), which reduced packet loss.
  • ​**New Problem:**​ The stream remains stable for ~50 seconds but then freezes for ~40 seconds before recovering.

Hello and thanks for using OpenZiti for this interesting use case!

You've previously sent me the log from a tunneler that was intercepting UDP connections, so I'll start there. I think there are two issues - packet loss and you also see a crash on the intercepting tunneler.

The packet loss is indicated by this log message:

to_ziti() ziti_write stalled: dropping UDP packet

What's happening here is the tunneler has received more data from the UDP client than it is able to send to the ziti network. In your case (based on the logs you sent) the tunneler isn't sending packets at all for this connection yet, because the other end of the ziti connection hasn't yet been established. The ziti SDK has a feature that allows it to buffer data while waiting for the connection to be completed. If it weren't for this feature, we would have no choice but to drop UDP packets that arrive before the end-to-end connection is established. Let's look at the stages of your connection that leads to a crash.

The first packet is intercepted. Notice that this causes the ziti service to be "dialed", and the connection state for conn "0.11/7PZyd4nD" is Connecting

(1058)[       16.498]   DEBUG tunnel-sdk:tunnel_udp.c:231 recv_udp() intercepted address[udp:172.16.117.3:30134] client[udp:172.16.7.151:15060] service[video-gbs-svc]
(1058)[       16.498] VERBOSE tunnel-cbs:ziti_tunnel_cbs.c:287 ziti_sdk_c_dial() ziti_dial(name=video-gbs-svc)
(1058)[       16.498]   DEBUG tunnel-cbs:ziti_tunnel_cbs.c:354 ziti_sdk_c_dial() service[video-gbs-svc] app_data_json[145]='{"connType":null,"dst_protocol":"udp","dst_ip":"172.16.117.3","dst_port":"30134","src_protocol":"udp","src_ip":"172.16.7.151","src_port":"15060"}'
(1058)[       16.498] VERBOSE ziti-sdk:connect.c:127 conn_set_state() conn[0.11/7PZyd4nD/Initial](video-gbs-svc) transitioning Initial => Connecting
(1058)[       16.498]   DEBUG ziti-sdk:connect.c:430 connect_get_service_cb() conn[0.11/7PZyd4nD/Connecting](video-gbs-svc) got service[video-gbs-svc] id[7BXVZa3bGCOZhmpOdjEwrm]
(1058)[       16.498]   DEBUG ziti-sdk:connect.c:551 process_connect() conn[0.11/7PZyd4nD/Connecting](video-gbs-svc) starting Dial connection for service[video-gbs-svc] with session[cm7ttdcab6wlgjin1uicdvjp1]

The initial packet and any subsequent packets from this client are queued. Notice the ziti connection state is still Connecting

(1058)[       16.498] VERBOSE tunnel-sdk:tunnel_udp.c:84 on_udp_client_data() 1272 bytes from 172.16.7.151:15060
(1058)[       16.498]   TRACE tunnel-sdk:tunnel_udp.c:54 to_ziti() writing 1272 bytes to ziti src[udp:172.16.7.151:15060] dst[udp:172.16.117.3:30134] service[video-gbs-svc]
(1058)[       16.498]   TRACE ziti-sdk:connect.c:1282 ziti_write() conn[0.11/7PZyd4nD/Connecting](video-gbs-svc) write 1272 bytes
(1058)[       16.498]   TRACE ziti-sdk:connect.c:811 flush_connection() conn[0.11/7PZyd4nD/Connecting](video-gbs-svc) starting flusher
(1058)[       16.498]   TRACE tunnel-sdk:tunnel_udp.c:54 to_ziti() writing 140 bytes to ziti src[udp:172.16.7.151:15060] dst[udp:172.16.117.3:30134] service[video-gbs-svc]
(1058)[       16.498]   TRACE ziti-sdk:connect.c:1282 ziti_write() conn[0.11/7PZyd4nD/Connecting](video-gbs-svc) write 140 bytes

The tunneler continues intercepting packets for the "Connecting" connection (for about 0.25 seconds), and then it runs out of buffer space for pending the packets.:

(1058)[       16.522]   TRACE tunnel-sdk:tunnel_udp.c:156 recv_udp() received datagram src[172.16.7.151:15060] dst[172.16.117.3:30134]
(1058)[       16.522] VERBOSE tunnel-sdk:tunnel_udp.c:84 on_udp_client_data() 1272 bytes from 172.16.7.151:15060
(1058)[       16.522]   TRACE tunnel-sdk:tunnel_udp.c:54 to_ziti() writing 1272 bytes to ziti src[udp:172.16.7.151:15060] dst[udp:172.16.117.3:30134] service[video-gbs-svc]
(1058)[       16.522] VERBOSE tunnel-cbs:ziti_tunnel_cbs.c:395 ziti_sdk_c_write() applying backpressure 129904 pending bytes
(1058)[       16.522]    WARN tunnel-sdk:tunnel_udp.c:66 to_ziti() ziti_write stalled: dropping UDP packet service=video-gbs-svc, client=udp:172.16.7.151:15060, ret=-7

Finally the ziti connection (with the hosting tunneler) is established:

(1058)[       16.601] VERBOSE ziti-sdk:connect.c:127 conn_set_state() conn[0.11/7PZyd4nD/Connecting](video-gbs-svc) transitioning Connecting => Connected
(1058)[       16.601] VERBOSE tunnel-cbs:ziti_tunnel_cbs.c:93 on_ziti_connect() on_ziti_connect status: 0
(1058)[       16.601]   DEBUG tunnel-sdk:ziti_tunnel.c:221 ziti_tunneler_dial_completed() ziti dial succeeded: client[udp:172.16.7.151:15060] service[video-gbs-svc]

At this point the tunneler starts sending the queued data to the hosting tunneler:

(1058)[       16.601]   TRACE ziti-sdk:connect.c:312 send_message() conn[0.11/7PZyd4nD/Connected](video-gbs-svc) => ct[ED72] uuid[53cd8b78:00000000:20e27] edge_seq[0] len[24] hash[53cd8b78:4646f9a9:b8afd0af:597ac30d:1641f46b:bd40a1a6:6b94fd82:503e939d]
(1058)[       16.601]   TRACE ziti-sdk:channel.c:420 ziti_channel_send_message() ch[0] => ct[ED72] seq[482] len[24]
(1058)[       16.601]   TRACE ziti-sdk:channel.c:391 on_channel_send() ch[0] write delay = 0.000d q=1 qs=104
(1058)[       16.601]   TRACE ziti-sdk:connect.c:240 on_write_completed() conn[0.11/7PZyd4nD/Connected](video-gbs-svc) status 0
(1058)[       16.601]   TRACE ziti-sdk:connect.c:312 send_message() conn[0.11/7PZyd4nD/Connected](video-gbs-svc) => ct[ED72] uuid[12305f2c:00000001:20e27] edge_seq[1] len[1289] hash[12305f2c:8a4baa58:1e8cc29c:92304a78:100a4b0e:917b026e:af3e22b9:c597549e]
(1058)[       16.601]   TRACE ziti-sdk:channel.c:420 ziti_channel_send_message() ch[0] => ct[ED72] seq[483] len[1289]
(1058)[       16.601]   TRACE ziti-sdk:channel.c:391 on_channel_send() ch[0] write delay = 0.000d q=1 qs=1357
(1058)[       16.601]   TRACE ziti-sdk:connect.c:240 on_write_completed() conn[0.11/7PZyd4nD/Connected](video-gbs-svc) status 0
(1058)[       16.601]   TRACE ziti-sdk:connect.c:312 send_message() conn[0.11/7PZyd4nD/Connected](video-gbs-svc) => ct[ED72] uuid[e6166c10:00000002:20e27] edge_seq[2] len[157] hash[e6166c10:a34ce0b2:377b229c:da788642:b75bfb40:cccc183a:e317e3d6:777c2e31]
(1058)[       16.601]   TRACE ziti-sdk:channel.c:420 ziti_channel_send_message() ch[0] => ct[ED72] seq[484] len[157]
(1058)[       16.601]   TRACE ziti-sdk:channel.c:391 on_channel_send() ch[0] write delay = 0.000d q=1 qs=225
(1058)[       16.601]   TRACE ziti-sdk:connect.c:240 on_write_completed() conn[0.11/7PZyd4nD/Connected](video-gbs-svc) status 0
Assertion "pbuf_free: p->ref > 0" failed at line 755 in /github/workspace/build/_deps/lwip-src/src/core/pbuf.c

As the data is sent over the OpenZiti overlay, the underlying storage (packet buffer, or "pbufs") for each of the pending packets is released. The assertion that is failing here suggests that one of the packet buffers is somehow being freed twice. This is the first time I've seen that happen, but clearly something isn't working as it should. Let me reacquaint myself with the relevant portions of code and get back to you on this.

In the meantime, it would be informative to see if you encounter this assertion when you give the ziti connection enough time to complete before hitting it with this high rate of data (if that's possible).

Edit: I'm not able to reproduce the problem you're seeing here (with udp), and the code seems correct to me so I must be missing something. If you're building ziti-edge-tunnel from source (I suspect you were at one point at least), could you add some options to the build to enable additional debug messages? The following preprocessor symbols will enable the debugging that I'm hoping to see:

#define LWIP_DEBUG 1
#define PBUF_DEBUG LWIP_DBG_ON

I achieved this by adding the following to my cmake configure preset in CMakeUserPresets.json:

      "cacheVariables": {
        "CMAKE_C_FLAGS": "-DLWIP_DEBUG=1 -DPBUF_DEBUG=LWIP_DBG_ON",
      }

You should see lines like this when running ziti-edge-tunnel after rebuilding:

pbuf_alloc(length=40)
pbuf_alloc(length=40) == 0x10668a258
pbuf_free(0x105575d84)
pbuf_free: deallocating 0x105575d84

Regarding the issue that you're seeing when using TCP, I'll need to see logs from the tunnelers (both intercepting and hosting) that were handling the TCP connection to help with that.

Thanks.

1 Like

I have sent the logs to your email.

Can you please confirm if the following issues have been resolved?

  1. Buffer Scaling for Connection Setup Phase
  • Current Status: Data packets are buffered before tunnel establishment (0.25s target).
  • Proposal: Increase PBUF_POOL_SIZE and MEM_SIZE to prevent overflow during this phase.

.
2. ​Heartbeat Packets Pre-Transmission

  • Objective: Confirm link stability via periodic heartbeat packets before initiating high-bitrate video streams.
  • Questions:
    • Is a custom heartbeat mechanism (e.g., 100ms interval) feasible, or should we rely on TCP keepalive?
    • How to balance heartbeat frequency and false-negative detection in lossy networks?
      .

Thanks for sending the logs. Packet buffer 0x57d55c67a000 is getting free'd twice, which leads to the assertion:

pbuf_free(0x57d55c67e060)
pbuf_free: deallocating 0x57d55c67e060
pbuf_free: deallocating 0x57d55c67a000
pbuf_free(0x57d55c67a000)
Assertion "pbuf_free: p->ref > 0" failed at line 755 in /home/zyb/ziti-tunnel-sdk-c-1.4.5/build/_deps/lwip-src/src/core/pbuf.c

The log messages that look like "pbuf_free(ADDRESS)" show when ziti-edge-tunnel releases a packet buffer by calling pbuf_free (which it does after the packet payload has been written to the overlay network). The log messages that have "deallocating" in them show pbuf_free iterating through links in the packet buffer as each link is released. You'll notice that all but one of the pbuf_free() log messages have exactly one corresponding "deallocating" message.

But somehow 0x57d55c67a000 is getting chained with 0x57d55c67e060, so freeing 0x57d55c67e060 results in both 0x57d55c67e060 and 0x57d55c67a000 being released. Then the assertion is triggered when ziti-edge-tunnel tries to release its reference to 0x57d55c67a000.

I can't explain how two separately allocated packet buffers are getting chained together. I've never seen this before and I'm not able to reproduce it myself. I can think of a few things for next steps:

  1. If your video app is something that I could run myself, maybe I could trigger the issue locally and get to the bottom of the cause.
  2. Are you changing any of the build-time constants related to packet and/or memory sizes? If so, this might explain why I'm unable to reproduce the problem.
  3. It's possible that a memory bug is responsible for the packet buffers becoming chained together. If you could run ziti-edge-tunnel under valgrind it would help rule this out (or in).

Thanks.

I think I can now explain the crash / assertion that you encountered. Basically ziti-edge-tunnel's tcp/ip stack can call our data callback function with a list of buffers in some (usually rare) situations. When this happens ziti-edge-tunnel needs to take care to free each buffer individually, because freeing the head of a list also frees the entire list.

This fix is implemented in ziti-edge-tunnel-v1.5.4 if you care to try it out. Note that this won't help with the warmup/throughput issues that you're asking about, but it should at least stay running during heavy UDP load.

1 Like