How to access the routing cost per circuit?

I have 2 services: socks5, element. The both are deployed on a machine where ovhXX routers run. However dc67 is a remote host, miles away from the services and from the clients also.

  • ping client -> ovhXX time=16ms;
  • ping client -> dc67 time=28ms.
╭────────────────────────┬─────────┬────────────┬─────────────────────┬────────────╮
│ ID                     │ NAME    │ ENCRYPTION │ TERMINATOR STRATEGY │ ATTRIBUTES │
│                        │         │ REQUIRED   │                     │            │
├────────────────────────┼─────────┼────────────┼─────────────────────┼────────────┤
│ 297xbk7xZecDRoHI3vQ4GC │ socks5  │ true       │ smartrouting        │            │
│ 422e3i40ThOGC0YVX5HaMY │ element │ true       │ smartrouting        │            │
╰────────────────────────┴─────────┴────────────┴─────────────────────┴────────────╯

There are two questions:

  1. Sometimes ziti network allocates a service terminator on a a very remote router dc67 which is miles away from the host where these services are hosted.
  2. Similar situation with client's connections. A client can pick up very remote router: r/dc67 -> l/7YrrmAJOqw2wKkENZWhEAX -> r/ovh76. It lasts hours. There is no rerouting.

ziti fabric list circuits:

╭───────────┬───────────────────────────┬─────────┬────────────────────────┬─────────────────────┬─────────────────────────────────────────────────╮
│ ID        │ CLIENT                    │ SERVICE │ TERMINATOR             │ CREATEDAT           │ PATH                                            │
├───────────┼───────────────────────────┼─────────┼────────────────────────┼─────────────────────┼─────────────────────────────────────────────────┤
│ LNs.Yds5F │ cma3li8za0maf4tj4ou1f2rh1 │ socks5  │ 6TJfsrSiFqEp6HNdfovgEw │ 2025-04-30 07:59:23 │ r/ovh76                                         │
│ Qfji1o.5e │ cma3li8za0maf4tj4ou1f2rh1 │ socks5  │ 6TJfsrSiFqEp6HNdfovgEw │ 2025-04-30 07:51:55 │ r/ovh221 -> l/1ZVOfxNRcZjWKlJGiWUvuV -> r/ovh76 │
│ Qgg4Yd.Ee │ cma3li8za0maf4tj4ou1f2rh1 │ socks5  │ 6TJfsrSiFqEp6HNdfovgEw │ 2025-04-30 07:57:25 │ r/ovh223 -> l/35Mlsz88K9xtbVFdZrjprO -> r/ovh76 │
│ T88nJds5e │ cma3li8za0maf4tj4ou1f2rh1 │ socks5  │ 6TJfsrSiFqEp6HNdfovgEw │ 2025-04-30 07:09:23 │ r/dc67 -> l/7YrrmAJOqw2wKkENZWhEAX -> r/ovh76   │
│ Vcb4YosEF │ cma3li8za0maf4tj4ou1f2rh1 │ socks5  │ 6TJfsrSiFqEp6HNdfovgEw │ 2025-04-30 07:57:25 │ r/ovh223 -> l/35Mlsz88K9xtbVFdZrjprO -> r/ovh76 │
│ XRGs1ds5F │ cma3luby80mq64tj45qxjpxh2 │ element │ 2fnPDU1qqTPhryBlrczwg1 │ 2025-04-30 07:59:06 │ r/ovh89 -> l/7Dqz00IgdTMbs2xwqLFE3j -> r/ovh223 │
│ anI9Yd.5e │ cma3li8za0maf4tj4ou1f2rh1 │ socks5  │ 6TJfsrSiFqEp6HNdfovgEw │ 2025-04-30 07:58:27 │ r/ovh89 -> l/1lWmp5wAPyslLgQgq3Msda -> r/ovh76  │
│ iNcA9os5e │ cma3li8za0maf4tj4ou1f2rh1 │ socks5  │ 6TJfsrSiFqEp6HNdfovgEw │ 2025-04-30 07:09:25 │ r/ovh223 -> l/35Mlsz88K9xtbVFdZrjprO -> r/ovh76 │
│ ls8sYosEe │ cma3luby80mq64tj45qxjpxh2 │ element │ 2fnPDU1qqTPhryBlrczwg1 │ 2025-04-30 07:59:06 │ r/ovh89 -> l/7Dqz00IgdTMbs2xwqLFE3j -> r/ovh223 │
│ n.rY1d.5e │ cma3li8za0maf4tj4ou1f2rh1 │ socks5  │ 6TJfsrSiFqEp6HNdfovgEw │ 2025-04-30 07:59:40 │ r/ovh76                                         │
╰───────────┴───────────────────────────┴─────────┴────────────────────────┴─────────────────────┴─────────────────────────────────────────────────╯

ziti fabric list terminators:

╭────────────────────────┬─────────┬────────┬─────────┬────────────────────────┬──────────┬──────┬────────────┬──────────────┬────────────╮
│ ID                     │ SERVICE │ ROUTER │ BINDING │ ADDRESS                │ INSTANCE │ COST │ PRECEDENCE │ DYNAMIC COST │ HOST ID    │
├────────────────────────┼─────────┼────────┼─────────┼────────────────────────┼──────────┼──────┼────────────┼──────────────┼────────────┤
│ 2fnPDU1qqTPhryBlrczwg1 │ element │ ovh223 │ edge    │ 2fnPDU1qqTPhryBlrczwg1 │          │    0 │ default    │            2 │ oOesPw1m9F │
│ 6TJfsrSiFqEp6HNdfovgEw │ socks5  │ ovh76  │ edge    │ 6TJfsrSiFqEp6HNdfovgEw │          │    0 │ default    │           32 │ oOesPw1m9F │
╰────────────────────────┴─────────┴────────┴─────────┴────────────────────────┴──────────┴──────┴────────────┴──────────────┴────────────╯

ziti edge list sessions:

╭───────────────────────────┬───────────────────────────┬──────────────┬──────╮
│ ID                        │ API SESSION ID            │ SERVICE NAME │ TYPE │
├───────────────────────────┼───────────────────────────┼──────────────┼──────┤
│ cma34zua000r84tj4qe9szovf │ cma34zu7c00r64tj422v2v0g7 │ socks5       │ Bind │
│ cma34zyla00rw4tj4gjnvugp7 │ cma34zyib00rt4tj4u7eqzlf2 │ element      │ Bind │
│ cma3li8za0maf4tj4ou1f2rh1 │ cma3li8lv0mad4tj4i2iq4rtu │ socks5       │ Dial │
│ cma3luby80mq64tj45qxjpxh2 │ cma3lubte0mq44tj42pjci591 │ element      │ Dial │
╰───────────────────────────┴───────────────────────────┴──────────────┴──────╯

It is unclear why cma3li8za0maf4tj4ou1f2rh1 sends the flow to the remote router dc67 then the flow goes back to ovh76:
r/dc67 -> l/7YrrmAJOqw2wKkENZWhEAX -> r/ovh76 (circuit T88nJds5e). There is no service terminators on dc67.

It seems that the cost is calculated for each session. But different circuits have different cost, isn't it?
Is it possible to access the cost per circuit information somehow?

As I have mentioned above a similar strange situation happens with service terminators. Ziti network allocates a terminator on the remote router dc67.

╭────────────────────────┬─────────┬────────┬─────────┬────────────────────────┬──────────┬──────┬────────────┬──────────────┬────────────╮
│ ID                     │ SERVICE │ ROUTER │ BINDING │ ADDRESS                │ INSTANCE │ COST │ PRECEDENCE │ DYNAMIC COST │ HOST ID    │
├────────────────────────┼─────────┼────────┼─────────┼────────────────────────┼──────────┼──────┼────────────┼──────────────┼────────────┤
│ 5jWewbMLAy3IyC1T2HPOUB │ element │ dc67   │ edge    │ 5jWewbMLAy3IyC1T2HPOUB │          │    0 │ default    │            2 │ qNTXfYbyhW │
│ 6Uz4d4tkjy6PAlbkPIOeTY │ socks5  │ ovh29  │ edge    │ 6Uz4d4tkjy6PAlbkPIOeTY │          │    0 │ default    │           26 │ qNTXfYbyhW │
╰────────────────────────┴─────────┴────────┴─────────┴────────────────────────┴──────────┴──────┴────────────┴──────────────┴────────────╯

Here is another example.
1 . Client (OH7ESFp5e) -> socks5 (Y-UUoWCyIW) via the remote router dc67 (WIZxdWbqhW)
ping OH7ESFp5e -> router WIZxdWbqhW time=28ms (cost 262191)

{"namespace":"circuit","event_src_id":"dc","timestamp":"2025-04-30T16:04:53.839962811Z","version":2,"event_type":"created","circuit_id":"PzL9UKbwT","client_id":"cma44lp6j002qtsj4zbgw7pfe","service_id":"297xbk7xZecDRoHI3vQ4GC","terminator_id":"HGvzP3xeqHAO2vccfgWvx","instance_id":"","creation_timespan":17532908,"path":{"nodes":["WIZxdWbqhW","Y-UUoWCyIW"],"links":["3kzPOHn1SeJnEZlbCVtKJN"],"ingress_id":"WykX","egress_id":"mYvX"},"link_count":1,"path_cost":262191,"tags":{"clientId":"OH7ESFp5e","hostId":"oOesPw1m9F","serviceId":"297xbk7xZecDRoHI3vQ4GC"}}
  1. Client (OH7ESFp5e) -> socks5 (Y-UUoWCyIW) via local router ovh89(T8.KBWbyhW)
    ping OH7ESFp5e -> router T8.KBWbyhW time=16ms(cost 261171)
{"namespace":"circuit","event_src_id":"dc","timestamp":"2025-04-30T16:07:47.099181796Z","version":2,"event_type":"created","circuit_id":"wfYhOdW5T","client_id":"cma44noke005otsj4hyhcqdpz","service_id":"297xbk7xZecDRoHI3vQ4GC","terminator_id":"HGvzP3xeqHAO2vccfgWvx","instance_id":"","creation_timespan":18808714,"path":{"nodes":["T8.KBWbyhW","Y-UUoWCyIW"],"links":["1vxV70bTB3De9VoNa3U0WS"],"ingress_id":"WQqo","egress_id":"BkvJ"},"link_count":1,"path_cost":262171,"tags":{"clientId":"OH7ESFp5e","hostId":"oOesPw1m9F","serviceId":"297xbk7xZecDRoHI3vQ4GC"}}

It seems that a circuit is not per session. Is it true? For the client(OH7ESFp5e) these two circuits are very different: the former has ping time 28ms the later has ping time 16ms. But for ziti these circuits are very similar 261171/262191.

Hi @Rantanplan

Some hopefully helpful clarifications:

  1. Circuits are per-connection, so there can be multiple circuits per session. API sessions allow an identity access to controllers and edge routers. Sessions allow access to a specific service. Each connection to that service will have a separate circuit.
  2. You can see routing cost in the circuit events.See path_cost - Events | OpenZiti
  3. Paths go from initiator (client side) to terminator (hosting side). So where you see dc67 in the circuit list, that's on the initiating side.
  4. The SDK picks the lowest latency ER to initiate a circuit on. The circuit is then established from that router to a terminator for the requested service.

So the SDK is picking dc67 as the best ER. Unfortunately because the costing happens in two steps, first picking the closest ER and then picking the best path from there to the service, we can end up with paths that are sub-optimal overall.

We are working towards holistic path selection, but that is the current state.

Note also that smart routing can't re-route the the initiating or terminating nodes, since those have stable connections to the endpoints. We're also working on that, currently working on stabilizing work allowing flow control in the SDK, which is a precursor to allowing the SDK to move circuits from router to router.

Hope that's helpful,
Paul

You have replied on my previous post

╭────────────────────────┬─────────┬────────┬─────────┬────────────────────────┬──────────┬──────┬────────────┬──────────────┬────────────╮
│ ID                     │ SERVICE │ ROUTER │ BINDING │ ADDRESS                │ INSTANCE │ COST │ PRECEDENCE │ DYNAMIC COST │ HOST ID    │
├────────────────────────┼─────────┼────────┼─────────┼────────────────────────┼──────────┼──────┼────────────┼──────────────┼────────────┤
│ 2mdoEbtVprBVKYJQKlOtYR │ element │ dc67   │ edge    │ 2mdoEbtVprBVKYJQKlOtYR │          │    0 │ default    │            4 │ oOesPw1m9F │
│ 6TJfsrSiFqEp6HNdfovgEw │ socks5  │ ovh76  │ edge    │ 6TJfsrSiFqEp6HNdfovgEw │          │    0 │ default    │           38 │ oOesPw1m9F │
╰────────────────────────┴─────────┴────────┴─────────┴────────────────────────┴──────────┴──────┴────────────┴──────────────┴────────────╯

So initially I thought as you have explained. But in reality I observe a strange behavior.
The services are on the same machine ovh29.
ping ovh29 -> ovhXX time=0.08ms
ping ovh29 -> dc67 time=17ms

If SDK picks up a wrong router dc67 then all routings to element service becomes wrong also. Because ziti routes all to dc67 but the service runs on ovh29.

Below is another example of wrong allocation of the service terminator. Element is bond to dc67. I apologize to remind that the services socks5 and element are on the same host ovh29.

ziti fabric list terminators
╭────────────────────────┬─────────┬────────┬─────────┬────────────────────────┬──────────┬──────┬────────────┬──────────────┬────────────╮
│ ID                     │ SERVICE │ ROUTER │ BINDING │ ADDRESS                │ INSTANCE │ COST │ PRECEDENCE │ DYNAMIC COST │ HOST ID    │
├────────────────────────┼─────────┼────────┼─────────┼────────────────────────┼──────────┼──────┼────────────┼──────────────┼────────────┤
│ 49r7aG1NIvJZxAC1TksdcB │ socks5  │ ovh29  │ edge    │ 49r7aG1NIvJZxAC1TksdcB │          │    0 │ default    │           34 │ oOesPw1m9F │
│ 6IFJyFAI1Bo8RoCwV5QrL1 │ element │ dc67   │ edge    │ 6IFJyFAI1Bo8RoCwV5QrL1 │          │    0 │ default    │            2 │ oOesPw1m9F │
╰────────────────────────┴─────────┴────────┴─────────┴────────────────────────┴──────────┴──────┴────────────┴──────────────┴────────────╯

So this routing table is wrong: the flow goes from ovh76 to dc67 (16ms) + tsl:dc67 -> ovh29 (16ms). Instead of the routing ovh76->ovhXX (0.08ms). This is a big mistake.

ziti fabric list circuits
╭───────────┬───────────────────────────┬─────────┬────────────────────────┬─────────────────────┬────────────────────────────────────────────────╮
│ ID        │ CLIENT                    │ SERVICE │ TERMINATOR             │ CREATEDAT           │ PATH                                           │
├───────────┼───────────────────────────┼─────────┼────────────────────────┼─────────────────────┼────────────────────────────────────────────────┤
│ 8qWj.MUKn │ cma4et4nz00tge5j4r3hv2yzx │ socks5  │ 49r7aG1NIvJZxAC1TksdcB │ 2025-04-30 21:27:54 │ r/ovh3 -> l/3oE6TrCSI1mkYAUmbXxb8m -> r/ovh29  │
│ EFoHozQKW │ cma4et4nz00tge5j4r3hv2yzx │ socks5  │ 49r7aG1NIvJZxAC1TksdcB │ 2025-04-30 20:51:50 │ r/ovh76 -> l/5qU3vy8Jt6pot3OMo54zw9 -> r/ovh29 │
│ EH7dWSUKW │ cma4esray00sve5j472ippq8u │ element │ 6IFJyFAI1Bo8RoCwV5QrL1 │ 2025-04-30 21:00:38 │ r/ovh76 -> l/3VbM5wCg8iOECiy3uiwky5 -> r/dc67  │
│ ESWksSUmn │ cma4et4nz00tge5j4r3hv2yzx │ socks5  │ 49r7aG1NIvJZxAC1TksdcB │ 2025-04-30 21:28:59 │ r/ovh3 -> l/3oE6TrCSI1mkYAUmbXxb8m -> r/ovh29  │
│ EWtLsMUmn │ cma4et4nz00tge5j4r3hv2yzx │ socks5  │ 49r7aG1NIvJZxAC1TksdcB │ 2025-04-30 21:28:49 │ r/ovh3 -> l/3oE6TrCSI1mkYAUmbXxb8m -> r/ovh29  │
│ I41L.SUKn │ cma4et4nz00tge5j4r3hv2yzx │ socks5  │ 49r7aG1NIvJZxAC1TksdcB │ 2025-04-30 21:28:49 │ r/ovh3 -> l/3oE6TrCSI1mkYAUmbXxb8m -> r/ovh29  │
│ PHxA.SUKW │ cma4et4nz00tge5j4r3hv2yzx │ socks5  │ 49r7aG1NIvJZxAC1TksdcB │ 2025-04-30 21:28:45 │ r/ovh3 -> l/3oE6TrCSI1mkYAUmbXxb8m -> r/ovh29  │
│ QYop.SQmn │ cma4et4nz00tge5j4r3hv2yzx │ socks5  │ 49r7aG1NIvJZxAC1TksdcB │ 2025-04-30 21:27:53 │ r/ovh3 -> l/3oE6TrCSI1mkYAUmbXxb8m -> r/ovh29  │
│ Rni4.SQmW │ cma4et4nz00tge5j4r3hv2yzx │ socks5  │ 49r7aG1NIvJZxAC1TksdcB │ 2025-04-30 21:29:36 │ r/ovh3 -> l/3oE6TrCSI1mkYAUmbXxb8m -> r/ovh29  │
│ VGWj.MUmn │ cma4et4nz00tge5j4r3hv2yzx │ socks5  │ 49r7aG1NIvJZxAC1TksdcB │ 2025-04-30 21:27:54 │ r/ovh3 -> l/3oE6TrCSI1mkYAUmbXxb8m -> r/ovh29  │
╰───────────┴───────────────────────────┴─────────┴────────────────────────┴─────────────────────┴────────────────────────────────────────────────╯

I don't understand why the route paths ovh76->dc67 (ping time=17ms) and ovh76->ovh29 (ping time=0.08ms) have a similar path_cost 262186/262175.

grep EH7dWSUKW /var/log/ziti-controller/fabric-circuit.json
{"namespace":"circuit","event_src_id":"dc","timestamp":"2025-04-30T21:00:38.561344893Z","version":2,"event_type":"created","circuit_id":"EH7dWSUKW","client_id":"cma4esray00sve5j472ippq8u","service_id":"422e3i40ThOGC0YVX5HaMY","terminator_id":"6IFJyFAI1Bo8RoCwV5QrL1","instance_id":"","creation_timespan":19431645,"path":{"nodes":["JI-BBYbyIW","WIZxdWbqhW"],"links":["3VbM5wCg8iOECiy3uiwky5"],"ingress_id":"yXGQ","egress_id":"d5gJ"},"link_count":1,"path_cost":262186,"tags":{"clientId":"OH7ESFp5e","hostId":"oOesPw1m9F","serviceId":"422e3i40ThOGC0YVX5HaMY"}}

grep EFoHozQKW /var/log/ziti-controller/fabric-circuit.json
{"namespace":"circuit","event_src_id":"dc","timestamp":"2025-04-30T20:51:50.787927352Z","version":2,"event_type":"created","circuit_id":"EFoHozQKW","client_id":"cma4et4nz00tge5j4r3hv2yzx","service_id":"297xbk7xZecDRoHI3vQ4GC","terminator_id":"49r7aG1NIvJZxAC1TksdcB","instance_id":"","creation_timespan":22623781,"path":{"nodes":["JI-BBYbyIW","7H8BaaKm9F"],"links":["5qU3vy8Jt6pot3OMo54zw9"],"ingress_id":"y2or","egress_id":"dKMy"},"link_count":1,"path_cost":262175,"tags":{"clientId":"OH7ESFp5e","hostId":"oOesPw1m9F","serviceId":"297xbk7xZecDRoHI3vQ4GC"}}

In these two post I talk about two scenarios:

  1. allocation of service terminators
  2. client routing path

It happens that they are both have the same issue.

The current post is mostly about the connection from access side to a service (socks5). We have ping time=28ms to dc, and ping time=16ms to ovhXX. You say that the circuits are per connection. It is very good. Why two circuits above have a very similar path_cost? This is an error? The later connection is 2 times faster than the former. So the client connects to the remote router dc67, the remote router dc67 send the flow to ovh89. This is the worst scenario we would like to avoid.
"event_src_id":"dc" is simply the controller's name which is hosted on the same machine as the router dc67.

As I see the access side connects to all available routers. So the only problem is how to create fast circuits for the initiating side (latency+measure of throughput). But there is a problem with the service terminators. If the service picks up a wrong router dc67 then all circuits become wrong. I just need to restart the bad router to break this bond.

Exactly. This is how I have found the path_cost whose values are misleading for me. So I concluded that the circuits are connection independent.