Edge router create times out when HA database is large

Hi There,

I’ve been testing HA Controller functionality under various high load conditions.

I’m currently using version 1.6.7 in HA mode with 3 Controllers and 2 Routers.

I’ve deliberately created ~135k identities to understand if there are any side effects of having a large database. The identities are not being used to connect SDK tunnellers so there are no Terminators.

Here’s the entityCount from my system for visibility.

{"namespace":"entityCount","event_src_id":"ctrl-0","timestamp":"2025-08-28T14:49:47.224510788Z","counts":{"apiSessionCertificates":0,"apiSessions":1,"authPolicies":1,"authenticators":2,"cas":0,"configTypes":5,"configs":2,"controllers":3,"edgeRouterPolicies":2,"enrollments":134492,"eventualEvents":0,"externalJwtSigners":0,"identities":134492,"identityTypes":2,"mfas":0,"postureCheckTypes":5,"postureChecks":0,"revocations":0,"routers":4,"routers.edge":4,"serviceEdgeRouterPolicies":1,"servicePolicies":2,"services":1,"services.edge":1,"sessions":0,"terminators":1}}

And my database is currently at about 1.7G.

[ziggy@ziti-controller-0 ~]$ ls -lah /etc/ziti/config/ctrl-ha.db
-rw-rw---- 1 ziggy ziggy 1.7G Aug 28 14:50 /etc/ziti/config/ctrl-ha.db

One issue i’ve found is that once the DB gets relatively large the ziti edge create edge-router … command often fails with a 503 returned to the CLI client.

The command i run is …

ziti edge create edge-router test --timeout 1000

and after 10 seconds i get the following response.

error: error creating edge-routers instance in Ziti Edge Controller at https://localhost:9443/edge/management/v1. Status code: 503 Service Unavailable, Server returned: {
    "error": {
        "code": "TIMEOUT",
        "message": "The requested operation took too much time to reply",
        "requestId": "CvTvWIPAH"
    },
    "meta": {
        "apiEnrollmentVersion": "0.0.1",
        "apiVersion": "0.0.1"
    }
}

In the Controller log i see…

{"file":"github.com/openziti/ziti/controller/raft/fsm.go:256","func":"github.com/openziti/ziti/controller/raft.(*BoltDbFsm).Apply","index":134827,"level":"info","msg":"apply log with type *model.CreateEdgeRouterCmd","time":"2025-08-28T14:42:01.694Z"}
{"_context":"tls:0.0.0.0:7443","error":"EOF","file":"github.com/openziti/transport/v2@v2.0.183/tls/listener.go:260","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"10.224.0.65:55878","time":"2025-08-28T14:42:07.608Z"}
{"file":"github.com/openziti/ziti/controller/api/timeouts.go:127","func":"github.com/openziti/ziti/controller/api.(*timeoutHandler).ServeHTTP","level":"error","method":"POST","msg":"timeout for request hit, returning Service Unavailable 503","time":"2025-08-28T14:42:11.685Z","url":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/edge/management/v1/edge-routers","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""}}
{"namespace":"entityCount","event_src_id":"ctrl-0","timestamp":"2025-08-28T14:42:10.224087541Z","counts":{"apiSessionCertificates":0,"apiSessions":1,"authPolicies":1,"authenticators":2,"cas":0,"configTypes":5,"configs":2,"controllers":3,"edgeRouterPolicies":2,"enrollments":134489,"eventualEvents":0,"externalJwtSigners":0,"identities":134492,"identityTypes":2,"mfas":0,"postureCheckTypes":5,"postureChecks":0,"revocations":0,"routers":1,"routers.edge":1,"serviceEdgeRouterPolicies":1,"servicePolicies":2,"services":1,"services.edge":1,"sessions":0,"terminators":1}}
{"_context":"tls:0.0.0.0:7443","error":"EOF","file":"github.com/openziti/transport/v2@v2.0.183/tls/listener.go:260","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"10.224.0.65:45380","time":"2025-08-28T14:42:17.607Z"}
{"error":"http: Handler timeout","file":"github.com/openziti/ziti/controller/api/responder.go:126","func":"github.com/openziti/ziti/controller/api.(*ResponderImpl).RespondWithProducer","level":"error","msg":"could not respond, writing to response failed","path":"/edge/management/v1/edge-routers","requestId":"CvTvWIPAH","time":"2025-08-28T14:42:17.977Z"}

And after about 30 seconds, i see the router exists when i run ziti edge list edge-routers

Other “creation” type CLI operations seem to work as expected in a timely manner. For example, ziti edge create identity... works fine.

Thanks in advance.

1 Like

Cool. It’s good to know where the limits of the system are first felt.

Is this right? You created 134,492 identities, and the router creation operation became slow and unreliable.

Are any system resources on the controller strained in the seconds leading up to the server error?

What about creating yet another identity? I’m asking because creating a router with CLI flag -t (long option --tunneler-enabled) triggers creating an identity of type Router.

EDIT: you already answered my second question. :slightly_smiling_face:

1 Like

Thanks @qrkourier.

It seems my issue was I/O.

I’ve now moved from a Standard to Premium SSD in Azure.

I no longer hit the timeout when creating the Edge Router however deletion of the router does hit the timeout. Not a massive problem for me.

Can you think why deletion hits the timeout but creation doesn’t ?