Terminator creation performance degradation

Hi Again,

i have a HA system which consists of 3 Controllers and 10 Edge Routers on V1.6.8. I’ve been running some load tests to understand how the system handles high levels of terminator creation which could happen in a scenario where many ZET clients are bought online at roughly the same time. I have 40k identities enrolled to the system with policies which spreads them all evenly over the available routers.

In my test i bring the all the ZET clients up over a period of 1 hour. The ZET clients have a polling interval of 1800s (30m). What i notice is that as more terminators exist on the system, the time to create a terminator increases exponentially. For example, the first 10k terminators will get created almost as soon as the ZET client comes online. The next 10K might take 1 hour but the final 20k will take 5 hours. If terminator creation per minute was charted on a graph, it would be seen as a logarithmic curve.

I understand that the bbolt database allows only one read-write transaction at a time and therefore only the Raft leader can apply to the data model sequentially. but i’m not sure this explains entirely why there is performance degradation in this way.

I have tried various combinations of more/less controllers and routers. Increasing VM resources obviously does improve the results but CPU/Memory utilisation is always nominal as well as IO. I’ve adjusted the rate limiters and queues but there is always this slow down in creating terminators and other entities.

During periods when high levels of terminator creation is required i do notice many occurrences of UnknownTerminator in Controller logs which triggers the terminator to be deleted which i am sure is slowing things down but i’m not sure this is the core reason as the numbers don’t correlate.

{"file":"github.com/openziti/ziti/controller/network/router_messaging.go:312","func":"github.com/openziti/ziti/controller/network.(*RouterMessaging).sendTerminatorValidationRequest","level":"info","msg":"queuing validate of terminator","terminatorId":"6OSD6UWKsv3zhMVO8Vcxt2","time":"2025-10-09T10:33:20.015Z"}
{"file":"github.com/openziti/ziti/controller/network/router_messaging.go:594","func":"github.com/openziti/ziti/controller/network.(*terminatorValidationRespReceived).handle","level":"info","msg":"terminator not valid, queuing terminator delete","reason":"UnknownTerminator","terminatorId":"6OSD6UWKsv3zhMVO8Vcxt2","time":"2025-10-09T10:33:20.016Z"}

Hi @farmhouse ,

Thank you for reporting this. My scale testing for terminator creation has usually testing in the range of 5k-10k terminators, so I haven't hit this myself.

I agree that the single writer shouldn't cause degradation of that severity. There's also some amount of exponential back-off happening, but that should cap at a reasonable level.

I'll have to run some tests and see what I can find. I created an issue to try and make sure we don't lose track of this, as I likely won't be able to get to the testing immediately: Terminator creation seems to slow exponentially as the number of terminators rises from 10k to 20k to 40k · Issue #3318 · openziti/ziti · GitHub

1 Like

Thanks @plorenz. Do let me know if i can provide any help investigating or testing further.