Plans to use a similiar NAT Traversal functionality like Tailscale

Hi there!

I have been using OpenZiti in my home lab for about two months now, and since I started implementing it, I have been wondering why all traffic between my tunnelers always has to run through a router and why the router does not just serve as a kind of relay for establishing the connection.

Wouldn't it make more sense (in terms of performance and network overhead) to use the routers only to establish a direct (peer-to-peer) connection between the tunnelers and then communicate directly with each other, similar to what Tailscale does with its NAT traversal or TCP hole punching? And only use the router for the connection if no direct connection is possible?

I think this would lead to massive performance improvements and relief for the routers because then most of the data transfer would be direct between the tunnelers.

For example: If I have a network with a client network and several networks for different systems such as servers or a separate printer network, and these networks cannot communicate with each other because I want to use zero trust technology for this purpose. Then I have to place a router (or multiple ones) in the client network and all network traffic to the server or printers runs entirely through the Ziti router(s). If a client now copies large amounts of data to the server via SMB, the router is heavily utilized and all network traffic must first go to the router, which then forwards it to the server. In this case, it would make more sense to use the router to establish a direct connection and then transfer the data between the client and server without the traffic going through the router. With Tailscale, I wouldn't have this overhead because a direct connection is established.

Or am I fundamentally misunderstanding something? Is there a reason why the traffic flow was not implemented like it is in Tailscale?

Thanks in advance and have a great day!

Hi @michi, nice diagrams! :slight_smile:

I'll just start off that this sort of functionality has been discussed for many, many years now. We've dabbled in soliving it here and there but in practicality the deman for us just hasn't been substantial enough yet for us to implement. There are some real benefits that TCP has over UDP but it's on our roadmap to implement. In general, we want to be able to do both eventually.

This will always be use-case driven but if you're going from one remote location to another, you'd be surprised at how little difference this ends up making in practice. That aside tho, Tailscale still offers a TURN server so it's quite possible your wireguard traffic might route through a Tailscale TURN server if UDP hole punching doesn't work (it's not always allowed). Blocking UDP traffic like this is a common source of frustration for some wireguard users (from my own reading).

Again, it'll be use-case dominated but in my own testing this has not been true (anecdotal maybe). I'm sure there are cases where it WILL be true but practically I just haven't had a problem so far. This just goes to the "demand hasn't been there yet" comment I made before.

I hate saying it but this is another one of those "you'd be shocked at just how non-intensive this actually is" imo. If you're doing 100's or 1000's of these, well then sure you might end up saturating CPU or the network itself or some other resource. For smaller-medium scale, I don't think that'd be all that noticable.

So that's a few reasons why it's not implemented yet. TCP (TLS) also allows us to implement mTLS, we can dial outbound ONLY and never have inbound holes so it works when UDP hole punching won't (the TURN server idea), and there's probably other benefits that I'm not thinking of...

Hope that helps?

Hi @TheLumberjack !

Thanks for your fast and detailed reply.

I’m delighted that this feature is planned and already looking forward to it. :slight_smile:

I didn't intend to completely replace the current behavior. I was more interested in increasing the speed in cases where hole punching is possible. In all other cases, the connection should switch to the current behavior. In those cases, you're just happy that a connection is possible at all.

I think I need to do some comparisons/speed tests between a peer-to-peer connection and OpenZiti, so I can say how big the differences in my use case are.
I just thought that a peer-to-peer connection should be faster by design, but maybe it isn’t such a big deal.

Thanks again and have a nice evening.

Networking can be strange. Testing things effectively is far more difficult than you'd ever imagine due to all the crazy things that can happen. A collegue of mine was doing scale testing many years ago and found that he'd routinely find better performance when he had two hops in his network. He was doing testing like from India to Australia or something like that and we were all surprised that hopping through a second intermediary router in ${i.forget.the.location} would be faster than a "direct" connect. He used the term "internet weather" to describe this phenomenon, I like that term... :slight_smile:

Anyway, my main point is that the differences are generally in that sort of 'noise'. Sometimes it's better one way, sometimes it's slower. My main concern is my own experience, do i "feel" that hop or don't I as a human... I never do... If i were a computer doing low-latency trading maybe I would - but that's just never been a problem in all my "real world" usage testing (workloads, apis, CLIs, file sharing etc...)

If you get around to doing anything - publish it and promote it! We always appreciate third party usage/analysis. It's more valuable to us in a lot of ways than our own testing.

Cheers

Hi @michi,

We've made some progress on this route by allowing the SDKs to handle flow-control. That's the first step in the path, since with a UDP P2P underlay, you'd need flow-control with retransmission on both sides, unless you're only running UDP-based software over it.

The next step will be to allow the SDK to participate in re-routing. Current re-routing only happens in the router-router portions of the circuit. Multi-path is a related area that would be interesting.

The final step would be allow P2P connections, where the P2P link is another options for circuits. That way we can use the mesh for backup for when the P2P link can't be formed or isn't performing as well as the mesh. Or if we have multi-path, we could use them together.

Not having done the testing, I don't want to make any performance claims for or against UDP P2P links. I think cost could be a bigger motivator than performance though. I have done some testing with using DTLS links (TLS over UDP) and found it difficult to get them to even get the performance up to par with regular TLS links. Tuning flow-control is tricky and time consuming and it's easy to optimize for certain conditions while making other scenarios worse.

Cheers,
Paul

@TheLumberjack Thanks for your thoughts on this.
If I test something and gain new insights, I will post them here.

@plorenz I am curious to see how this will develop over time.

Hello all!

I did some testing on this.

I deployed the following infrastructure in the cloud.

I installed the latest available version (V10 on Hetzner, V9 on DigitalOcean) of Rocky Linux on the virtual servers and updated all packages before running my tests.

Afterwards I installed OpenZiti and NetBird (p2p zero trust alternative to Tailscale) with their official installation scripts.

On my testing server at DigitalOcean I installed iperf3 with dnf install iperf3. Also I installed the native Linux client of NetBird and the ziti-edge-tunnel client.

Used software versions:

  • Rocky Linux (Hetzner): 10.1
  • Rocky Linux (DigitalOcean): 9.7
  • iperf 3.9
  • OpenZiti Controller: 1.6.12
  • OpenZiti ZAC: 4.0.2
  • OpenZiti Router: v1.6.12
  • ziti-edge-tunnel: v1.9.5
  • NetBird Management: v0.64.3
  • NetBird Dashboard: v2.30.0
  • NetBird Linux Client: v0.64.3

Afterwards I began with my tests. I always ran three different tests and ran each of them twice.

  • iperf3 -c <IP> -p 8888
  • iperf3 -c <IP> -p 8888 -u
  • iperf3 -c <IP> -p 8888 -P 4

Server: iperf3 -s -p 8888 / iperf3 -s -p 8899

Test 1: Direct connection from Frankfurt (client) to Amsterdam (without zero-trust):

test1.txt (16.1 KB)

Test 2: Direct connection from Amsterdam (client) to Frankfurt (without zero-trust):

test1.txt (16.1 KB)

Test 3: OpenZiti connection from Frankfurt (client) to Amsterdam:

test3.txt (15.8 KB)

Test 4: OpenZiti connection from Amsterdam (client) to Frankfurt:

test4.txt (15.8 KB)

Test 5: NetBird P2P connection from Frankfurt (client) to Amsterdam:

test5.txt (16.1 KB)

Test 6: NetBird P2P connection from Amsterdam (client) to Frankfurt:

test6.txt (16.1 KB)

Conclusion:

The differences between a direct connection and a P2P connection via NetBird were not significant. In comparison, the connection via OpenZiti only achieved half or a third of the performance. Except for the UDP speed tests. These were not particularly fast in any of the tests.

However, it should also be kept in mind that a full utilization of the available bandwidth (as occurs during speed tests) is rarely the case in everyday use.

I haven't yet figured out how to conduct a realistic test without it being dependent on many variables. In everyday use and with a limited number of users, you probably won't notice the limitations. In an environment with hundreds or thousands of users, you might notice the effects more, but then there are many other factors that could influence performance in other ways.

Cheers
Michi

1 Like

Thanks for sharing your test results! It's a very interesting area and I'm hoping we can make some more progress on this year.

Cheers,
Paul