Performance observations

Hi all,

playing with OpenZiti in my Lab I observed some issues with speed and load, where I am not sure if I did anything wrong.
To make sure and no one gets this wrong. I don't want to blame the speed of OpenZiti I would like to understand if my setup is somehow wrong.
I know that testing throughput is always based on the setup and always somehow biased but I did a very simple test in my environment which should be fair from my point of view.
Setup:
Proxmox host with 3 virtual machines:

  • OpenZiti Controller setup using Quckstart
  • Ubuntu 22.04 serving as the iperf3 server
  • Ubuntu 22.04 serving as the ipfer3 client

I know that this might be problematic due to testing tcp throughput over a tcp openziti connection.
I tested plain throughput, tailscale, netbird and openziti. All on these two hosts with other services disabled at the time of testing:

No VPN / Overlay, direct connection:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  54.8 GBytes  47.1 Gbits/sec  266             sender
[  5]   0.00-10.04  sec  54.8 GBytes  46.9 Gbits/sec                  receiver

Netbird:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.95 GBytes  2.53 Gbits/sec  288             sender
[  5]   0.00-10.04  sec  2.94 GBytes  2.52 Gbits/sec                  receiver

Tailscale:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  4.83 GBytes  4.15 Gbits/sec  2692             sender
[  5]   0.00-10.04  sec  4.83 GBytes  4.13 Gbits/sec                  receiver

OpenZiti:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   470 MBytes   394 Mbits/sec    0             sender
[  5]   0.00-10.10  sec   466 MBytes   387 Mbits/sec                  receiver

If someone could comment on my oberservations it would be great.
Thanks
Kai

Hi @strand, do you happen to have a set of steps someone could use to reproduce the results? Like you mention, performance testing is sometimes kind of a black art and getting reproducible results is sometimes a challenge. That said, personally I am glad to see you giving it a go! I'd be very interested in your methodology. Unfortunately, it'll mean you have to document the steps you took relatively in detail so that someone else (myself or others) could replicate and review exactly what you did.

There's really so many factors at play, I think without understanding exactly what/how you're doing it, it's hard to comment.

My first request would be you run the tests using different machines to make the test less "theoretical" and more "real world". I think that would make a difference and really is probably more aligned to what your end goal is?

Hi @TheLumberjack ,

in general there should be not too many steps to replicate my environment.
I have a testing Proxmox server where I have several systems up and running.
A Netbird Server, a headscale server for tailscale, and an OpenZiti Controller with the edge router.
For testing purpose it should not matter if you use the selfhosted headscale server or the hosted tailscale as the clients will directly connect.

Installation of the OpenZiti Server is done as outlined in Quickstart nothing fance about this.
Ubuntu 22.04 as the operating system in an LXC container with 2 cores and 2GB of RAM.
2 LXC hosts each Ubuntu 22.04 to act as the iperf3 server and client.

These two hosts (iperf3 server and client) get 3 different VPN / Overlay services installed.
and the performance is testing using iperf3 with the default values for server and client:
Server Side:

iperf3 -s

Client Side:

iperf3 -c <IP / Hostname of the iperf3 server>

This is repeated for each service several time to get the average.

The outcome is at stated above.
On top I did another test with Zerotier (just for the full picure):

t[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   963 MBytes   808 Mbits/sec  518             sender
[  5]   0.00-10.05  sec   962 MBytes   804 Mbits/sec                  receiver

I know that doing the tests on real HW would be better but having this done on LXCs the performance should scale, especially as I will use the overlay heavily on LXCs and VMs, so for my usecase this benchmarks are good.

One thing I noticed is the fact that DNS is not working when using OpenZiti Edge Tunnel in an LXC. It seems that resolvectl is used correctly I do see the logs, but name resolution does not work.
Installing Openziti Edge Tunnel in a real VM using Ubuntu 22.04 it works like charme (btw. no performance difference for me).
I am not sure if this is due to Proxmox killing this in LXC or due to the Ubuntu 22.04 LXC template.

cheers
Kai

A few observations.
First, the other technologies (non OpenZiti) are at a distinct advantage when you are performing all local testing. Without any real propagation delay involved, OpenZiti has to hairpin at the router, so there are multiple connections involved instead of a direct point to point. Depending on network topology, a "real" OZ network may have similar latencies, if the Edge Router is geographically between the endpoints. Similarly, it could add more if it is not.

Second, I am not familiar enough with the other solutions to the level of if they make changes to the underlying host TCP flow control settings, permanently or provide some sort of override for values such as window size and others. I'm not sure we've posted this here, but we use a set of sysctl settings by default (included at end of post) in our internal images. It would be interesting to see these results with some level of actual delay involved and compare the flow control settings. This also removes any hypervisor tweaks like shared memory that may be improving performance in certain situations. I have zero information that ProxMox supports this kind of process, but to be 100% sure, I prefer the systems be separated when I run testing.

Lastly, and related to the above, there are a number of OpenZiti flow control settings that are configurable, these are internal to the OZ process, not the host. I am actually in the process of setting up a matrix of tests to manipulate these to run similar similar throughput tests and discover optimized values for the default, as well as potentially for specific cases. You can find these options here. I believe that the current defaults are much less performant than OZ is capable of, hence the intent to work out a better set.

Last lastly, and I ask this as someone who has long been in these conversations with various technologies, why is single flow throughput so important? In a few cases, such as large scale backups, I can completely understand, but 99.9% of internet connections do not require and cannot use that level of throughput, nor are such speeds (as your unencumbered) available from almost anything but testing systems or configurations. OpenZiti (and other solutions) provide a much larger value set than throughput alone. I think we often get sidetracked by isolating a single characteristic, especially this one, rather than looking at what the real use cases are and assessing the fit to those use cases as a whole. I ask this not to deflect, but to check my understanding, and find if I am missing some use cases or issues in general that I should be thinking more about and therefore putting more effort into that case.

2 Likes

And, of course, I forgot to post the settings I referenced...

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_mem = 8388608 8388608 16777216
net.ipv4.udp_mem = 8388608 8388608 16777216
net.ipv4.tcp_retries2 = 8