How to track down throughput problems on the Ziti router

Hi @plorenz ,

I am currently trying to set up a private router as shown in this diagram Create private router on linux machine - #8 by aepurvis to have an additional router on my internal instance for terminating services.

I am having trouble to getting this to work. I've set up the internal router and it connects to the controller, and I'm also able to bind a service to it and I also see the service bound to the tunneler as a 'terminating router'.
But it doesn't work. When I try to connect the service, my 'public router' logs:

ziti-router [317653.895] WARNING edge/router/xgress_edge.(*edgeClientConn).processConnect [ch{edge}->u{classic}->i{YXBe}]: {connId=[157] type=[EdgeConnectType] error=[can't route from PrN9ro.gay ->
nSFesiP-6] chSeq=[390] edgeSeq=[0] token=[b2c90b0f-ce48-4b48-8b1a-8d573717e7f1]} failed to dial fabric

It appears to me that the 'fabric link' between the private and public routes is not established. In the private router, I only see the outgoing connection to the controller/API, but no connection to the public router.

This is the config snipplet from the public router:

    link:
      dialers:
        - binding: transport
      listeners:
        - binding:          transport
          bind:             tls:0.0.0.0:10080
          advertise:        tls:<transport-name>:443
          options:
            outQueueSize:   4
    listeners:
         [...]

And I'm able to connect to 'transport-name' from my private router.

On the private router, the config just has transport configured as dial:

    link:
      dialers:
        - binding: transport
    listeners:
      - binding: edge
        address: tls:0.0.0.0:3022
        options:
            advertise: edge.internal.domain:443
            connectTimeoutMs: 1000
            getSessionTimeout: 60
      - binding: tunnel
        options:
            mode: host

But I don't see the private router connecting to the public router. I think this is causing the error message.

This is the log from the int router:

[   0.003]    INFO ziti/ziti/router.run: {revision=[5237e2b4794a] configFile=[/etc/ziti/config/ziti-router.yaml] version=[v0.31.0] go-version=[go1.20.10] routerId=[nSFesiP-6] os=[linux] arch=[amd64]
 build-date=[2023-11-01T17:56:17Z]} starting ziti-router
[   0.003]    INFO ziti/common/metrics.GoroutinesPoolMetricsConfigF.func1.1: {idleTime=[30s] poolType=[pool.link.dialer] maxQueueSize=[1000] minWorkers=[0] maxWorkers=[32]} starting goroutine pool
[   0.003]    INFO ziti/common/metrics.GoroutinesPoolMetricsConfigF.func1.1: {maxWorkers=[128] idleTime=[30s] maxQueueSize=[1000] poolType=[pool.route.handler] minWorkers=[0]} starting goroutine poo
l
[   0.004]    INFO ziti/router/forwarder.(*Faulter).run: started
[   0.004]    INFO ziti/router/forwarder.(*Scanner).run: started
[   0.004] WARNING ziti/router/internal/edgerouter.(*Config).LoadConfigFromMap: Invalid heartbeat interval [0] (min: 60, max: 10), setting to default [60]
[   0.004] WARNING ziti/router/internal/edgerouter.parseEdgeListenerOptions: port in [listeners[0].options.advertise] must equal port in [listeners[0].address] for edge binding but did not. Got [443
] [3022]
[   0.004]   ERROR ziti/ziti/router.run: {error=[remove /tmp/gops-agent.1.sock: read-only file system]} unable to start CLI agent
[   0.004]    INFO ziti/router.(*Router).showOptions: ctrl = {"OutQueueSize":4,"MaxQueuedConnects":1,"MaxOutstandingConnects":16,"ConnectTimeout":5000000000,"DelayRxStart":false,"WriteTimeout":0}
[   0.004]    INFO ziti/router.(*Router).showOptions: metrics = {"ReportInterval":60000000000,"IntervalAgeThreshold":0,"MessageQueueSize":10}
[   0.004]    INFO ziti/common/metrics.GoroutinesPoolMetricsConfigF.func1.1: {maxWorkers=[32] idleTime=[30s] maxQueueSize=[5000] minWorkers=[0] poolType=[pool.link.dialer]} starting goroutine pool
[   0.004]    INFO ziti/router.(*Router).initializeHealthChecks: starting health check with ctrl ping initially after 15s, then every 30s, timing out after 15s
[   0.004]    INFO ziti/router.(*Router).startXlinkDialers: started Xlink dialer with binding [transport]
[   0.004]    INFO ziti/router/xgress_edge.(*listener).Listen: {address=[tls:0.0.0.0:3022]} starting channel listener
[   0.004]    INFO ziti/common/metrics.GoroutinesPoolMetricsConfigF.func1.1: {maxWorkers=[16] poolType=[pool.listener.xgress_edge] idleTime=[10s] maxQueueSize=[1] minWorkers=[1]} starting goroutine
pool
[   0.005]    INFO ziti/router.(*Router).startXgressListeners: created xgress listener [edge] at [tls:0.0.0.0:3022]
[   0.005]    INFO ziti/router.(*Router).startXgressListeners: created xgress listener [tunnel] at []
[   0.005]    INFO ziti/router.(*Router).getInitialCtrlEndpoints: controller endpoints file [/etc/ziti/config/endpoints] doesn't exist. Using initial endpoints from config
[   0.005]    INFO ziti/router.(*Router).startControlPlane: router configured with 1 controller endpoints
[   0.005]    INFO ziti/router/xgress_edge.(*Acceptor).Run: starting
[   0.005]    INFO ziti/router/env.(*networkControllers).UpdateControllerEndpoints: {endpoint=[map[tls:control.my-domain-mask:443:{}]]} adding new ctrl endpoint
[   0.005]    INFO ziti/router/env.(*networkControllers).connectToControllerWithBackoff: {endpoint=[tls:control.my-domain-mask:443]} starting connection attempts
[   0.063]    INFO ziti/router/env.(*networkControllers).connectToControllerWithBackoff.func3: {endpoint=[tls:control.my-domain-mask:443]} successfully connected to controller
[   0.064]    INFO ziti/router/handler_edge_ctrl.(*helloHandler).HandleReceive.func1: received server hello, replying
[   0.064]    INFO ziti/router/fabric.(*StateManagerImpl).AddSignerPublicCert: {added=[1] ignored=[0] received=[1]} received signer public certificates
[   0.065]    INFO ziti/router/handler_ctrl.(*validateTerminatorsHandler).validateTerminators: removing invalid terminator 1MR3S3zxyoN2gMd0n9M4Y4 with binding: edge. had dialer? true
[   0.065]    INFO ziti/router/handler_ctrl.(*validateTerminatorsHandler).validateTerminators: removing invalid terminator 2VgQ8yK4Wle2HZ7szmjeZr with binding: edge. had dialer? true
[   0.065]    INFO ziti/router/handler_ctrl.(*validateTerminatorsHandler).validateTerminators: removing invalid terminator 2rlx3ziLFtBslvPI1UBPSk with binding: edge. had dialer? true
[   0.065]    INFO ziti/router/handler_ctrl.(*validateTerminatorsHandler).validateTerminators: removing invalid terminator 2zvJVtvhVcJ4gYe8Qv3Vwq with binding: edge. had dialer? true
[   0.065]    INFO ziti/router/handler_ctrl.(*validateTerminatorsHandler).validateTerminators: removing invalid terminator 31SdzbM3eMof6Jv94wTavP with binding: edge. had dialer? true
[   0.066]    INFO ziti/router/handler_ctrl.(*validateTerminatorsHandler).validateTerminators: removing invalid terminator 3n0qi10AnF4riQfHRD0BBi with binding: edge. had dialer? true
[   0.066]    INFO ziti/router/handler_ctrl.(*validateTerminatorsHandler).validateTerminators: removing invalid terminator 460ed460-0a0e-4a92-b396-f897edc22f54 with binding: tunnel. had dialer? true
[   0.066]    INFO ziti/router/handler_ctrl.(*validateTerminatorsHandler).validateTerminators: removing invalid terminator 5tgYjiguTRslCzy019lcpj with binding: edge. had dialer? true
[   0.066]    INFO ziti/router/handler_ctrl.(*validateTerminatorsHandler).validateTerminators: removing invalid terminator 6w4RvIQIwTcIrqe78ZBLDX with binding: edge. had dialer? true
[   0.066]    INFO ziti/router/handler_ctrl.(*validateTerminatorsHandler).validateTerminators: removing invalid terminator 6z9wrdaziMDMmR1o5YWpXf with binding: edge. had dialer? true
[   0.066]    INFO ziti/router/handler_ctrl.(*validateTerminatorsHandler).validateTerminators: removing invalid terminator 7auKwrsVN3hShkxqH8fFDW with binding: edge. had dialer? true
[   0.066]    INFO ziti/router/handler_ctrl.(*validateTerminatorsHandler).validateTerminators: send batch of 11 terminator deletes
[   0.067]    INFO ziti/router/handler_edge_ctrl.(*apiSessionAddedHandler).instantSync: {strategy=[instant]} first api session syncId [clpqskv7hbda50dg47z2h4ntl], starting
[   0.067]    INFO ziti/router/handler_edge_ctrl.(*apiSessionSyncTracker).Add: received api session sync chunk 0, isLast=true
[   0.077]    INFO ziti/router/fabric.(*StateManagerImpl).StartHeartbeat: heartbeat starting
[   0.078]    INFO ziti/router/xgress_edge_tunnel.(*tunneler).Start: {mode=[host]} creating interceptor
[   0.078]    INFO ziti/router/xgress_edge.(*CertExpirationChecker).Run: waiting 8602h42m0.181410693s to renew certificates
[   0.078] WARNING ziti/tunnel/dns.flushDnsCaches: {error=[exec: "resolvectl": executable file not found in $PATH]} unable to find systemd-resolve or resolvectl in path, consider adding a dns flush
to your restart process
[   0.787]    INFO ziti/tunnel/intercept.SetDnsInterceptIpRange: dns intercept IP range: 100.64.0.1 - 100.127.255.254
[   0.789]    INFO ziti/tunnel/intercept.(*ServiceListener).HandleServicesChange: {service=[iac-int-kube-1.sys.hzr-core-service_80,443_t-2]} adding service
[   0.789]    INFO ziti/tunnel/intercept.(*ServiceListener).addService: {serviceId=[ytM1692EdaOJZurZ2G19P] serviceName=[iac-int-kube-1.sys.hzr-core-service_80,443_t-2]} Hosting newly available servi
ce
[   0.789]    INFO ziti/router/xgress_edge_tunnel.(*fabricProvider).establishTerminatorWithRetry.func1: {service=[iac-int-kube-1.sys.hzr-core-service_80,443_t-2]} attempting to establish terminator
[   0.791]    INFO ziti/router/xgress_edge_tunnel.(*fabricProvider).HandleTunnelResponse: {routerId=[nSFesiP-6] terminatorId=[ebdb6d7d-3ca5-4bd0-aed3-8dd15c907f7c] sessionId=[clpqlve2jb4kn0dg4oakgmh
gy]} received new session
[   0.791]    INFO ziti/router/xgress_edge_tunnel.(*fabricProvider).HandleTunnelResponse: {routerId=[nSFesiP-6] terminatorId=[ebdb6d7d-3ca5-4bd0-aed3-8dd15c907f7c] createDuration=[2.976156ms]} recei
ved terminator created notification
[   1.068]    INFO ziti/router/handler_edge_ctrl.(*apiSessionAddedHandler).applySync: finished sychronizing api sessions [count: 8, syncId: clpqskv7hbda50dg47z2h4ntl, duration: 170.177µs]

What am I missing? Can you point me on something I might look into?

Thanks & Bye,
Chris