Can the zrok controller/frontend scale using something like Docker swarm or Amazon ECS if deployed properly. Where do you see configuration changes needed to support doing this based on the current Self hosting docker instructions?
You can autoscale a single public frontend configuration horizontally based on the metric of your choice. Those are your public reverse proxies that handle ingress load for all public shares, and each instance of the public frontend is a clone. It's also possible to have more than one frontend with a different configuration and each configuration would need to be scaled separately because they represent a unique identity.
You can scale your Ziti routers too. They'll auto-balance all the zrok backends and frontends. It's a good idea to place a couple of routers in each geography you wish to serve for reliability and performance. Auto-scaling routers is less simple because they are each uniquely identified to each other and the Ziti controller. Auto-scaling routers would involve provisioning additional Ziti routers triggered by some load metric.
That leaves the Ziti and zrok controllers. Ziti controller clusters are in beta, and won't be available for prod deployments for a little while. The zrok controller is always solo, at least for v0.4, I believe. Are you seeing significant load on the zrok controller? I understand it's involved in provisioning environments and services, but I don't believe it's a bottleneck for already-provisioned shares.
Thank you for the detailed response. Honestly load is very minimal for us at the moment but I am trying to think ahead if we end up growing substantially. In the interim do you have an idea of what maximum number of connections / throughput would be for a single instance?
That's wise to look ahead. No, I don't have an intuition for the theoretical max, but I'm sure we could tip over a zrok frontend if we tried hard enough.
If you can predict which dimension you're most likely to grow along then it's simpler to predict the point of failure, e.g., circuit saturation vs. CPU.
Have you already figured out how you're going to chart resource usage? zrok and Ziti have some great metrics and, combined with a system level monitoring, those will allow you isolate and analyze resource usage.
The solution for scaling zrok frontend is definitely horizontal replicas not vertical. Even a DNS round robin with a number of frontends will get you very far indeed.
Here's an informed perspective on performance: High availability architecture for routers and controller - #8 by mike.gorman
You may be able to mine more insights from the same poster or this alias.