Ziti Network Client - Online Status

janst · April 7, 2025, 9:10am

Hi there

I've been testing the online status (hasEdgeRouterConnection) of my ziti-edge-tunnel network clients. During testing I noticed that when my devices are abruptly disconnected from the network or loose power, it takes about 30 minutes for their online status to update in the ZAC. How does Ziti determine whether devices (Ziti-edge-tunnelers) are online or offline?

Is there a timeout setting that can be configured on the controller to adjust this?

-> I have found the sessionTimeout setting, but this seems to be related to the API session of the respective client (ziti-edge-tunnel) to the ziti controller. Does this timeout also affect the hasEdgeRouterConnection status ?

BR
Jan

TheLumberjack · April 10, 2025, 11:33am

I thought for sure I had replied here but I don't see my response... 30 minutes is the default session timeout. So the "has api session" bubble showing up for 30 minutes makes sense to me as to why you might see it on an abrupt outage. At that time you should lost "edge router connected" though.

The "online-ness" of identities has been coming up a lot lately and I expect we'll be making changes to it in the coming future. Until then, I would expect you to see a "has api session" for as long as the session timeout is configured on power outages. In your controller find edge.api.sessionTimeout and change it from 30m to something else if you want. I don't really think you should, just because clustered controllers might change it too.

If I were to guess, it sounds to me like we are hearing from the community that more precise information regarding the overall health of the device is what people are actually desiring. It would probably be helpful overall if you could let us know what the exact use case is you're looking to solve as it might inform our responses here and future direction regarding these indicators.

Cheers

janst · April 10, 2025, 11:46am

Thanks for responding

Ok, so if I get that right, I can only influence the hasApiSession flag by setting the edge.api.sessionTimeout, but not the hasEdgeRouterConnection one ?

So in case of a power outage of the network client (e.g. ziti-edge-tunnel or sdk-embedded app) the hasApiSession value won't change until the timeout of (by default) 30min has passed.

How should the hasEdgeRouterConnected flag be interpreted then ? As I understand - or least I assume - the routers maintain a connection to the clients via the use of some sort of heartbeat, so they should actually recognize that a client has gone offline rather quickly. But it seems that this is not what the hasEdgeRouterConnection property reflects

TheLumberjack · April 10, 2025, 11:59am

I would flip this, but admittedly, I don't know for sure the mechanism here as I didn't work on it. I would think the clients maintain connections to routers and are responsible for reconnecting and I would expect if the client isn't noticed by the routers it is connected to, after some small amount of time it would be reported to the controller as not having a connection to that router any more. Once all routers the client was connected to report to the controller that it's not longer connected, I'd expect the controller to mark it as having no router connections.

I would expect a process suddenly terminating to reflect in the ZAC as having an api session for 30 minutes, and not having a router connection after "a few moments" (maybe like a minute?)

You say that doesn't seem to be what you're seeing, but that's what happened for me in my test. Maybe you're doing something I'm not?

I used the ziti cli to start a 'server' that binds a service:

ziti ops verify traffic --prefix connection_test --mode server

I then used kill to kill this:

ps -ef | grep "verify traffic"
kill 153647

ZAC still shows two green bubbles:

About 15-30 seconds later, one green bubble:

then I cleaned up using:

ziti ops verify traffic --prefix connection_test --cleanup

janst · April 10, 2025, 12:05pm

let me quickly test that the same way you did
which versions are you using ?

TheLumberjack · April 10, 2025, 12:07pm

Well, this one is a "recent build from source" so I don't know the exact version, but I'm usually running "pretty new/recent" deployments. I'd guess this is 1.4+ for sure, it might be 1.5+... "pretty new"... However, this functionality I doubt has changed much recently (i reserve the right to be wrong)

plorenz · April 10, 2025, 1:46pm

The 1.2 release was focused on changing how online status was calculated. Instead of API session heartbeats, it's now focused on connect/sdk events. There's a controller config flag which indicates if the system should use api session heartbeats, connect events or both to manage the online status.

The 1.2 change has the full write-up.

Paul

janst · April 10, 2025, 1:50pm

Oooook so, I was now testing with controller/router V1.4.3 and ziti-edge-tunnel V1.5.4

I have now replicated your test and also got the same result - the hasEdgeRouterConnection bulb went grey after ~15 seconds after killing the Ziti process within my WSL. Additionally, I tested the same on a different machine than mine - basically a lower power device, but also running Linux - there I had a similar result. In this case, it took ~40 seconds until the hasEdgeRouterConnection bubble went grey after killing the Ziti process on this machine.

But when I did the same test, except instead of killing the process just cutting off power for the device, it took ~10 minutes until the routers have recognized that this machine was really offline, because it took 10 minutes for the hasEdgeRouterConnection bubble to turn grey after the power cut.

I noticed the same behavior when testing within AWS by running a EC2 instance and installing the Ziti-edge-tunneler (V1.5.4) and then blocking any traffic using NACLs. Here, it also took roughly 10 minutes until the hasEdgeRouterConnection changed from online/green to offline/grey.

I think what is happening here (and I am not a networking expert ) is that in case we kill the process under Linux, the Linux kernel closes the socket, which was owned by the now-killed process, and sends a TCP FIN (?) to the remote - which is a Ziti-router. But when the power is cut off or the network cable is unplugged or the network just suddenly stops working, the Linux kernel obviously cannot send that TCP connection termination info, and thus the Ziti-routers do not recognize (at least not within a short amount of time) that a client has gone offline.

janst · April 10, 2025, 1:55pm

I saw that in the release notes, but I could not find where I should set the setting described. In the ziti docs -> controller config section it does not state the identityStatusConfig setting

Also I am running the whole setup in K8s and the helmChart does also not allow specifing that setting But if you tell me where I can set that, I can test it by just manually changing the respective configmap in K8s

qrkourier · April 10, 2025, 3:40pm

Based on this example controller configuration YAML (ziti/zititest/models/sdk-status-test/configs/ctrl.yml.tmpl at v1.5.4 · openziti/ziti · GitHub), I determined that the directive must be defined in edge.identityStatusConfig.

You may append arbitrary, additional configuration to edge section of the Kubernetes controller's ConfigMap by defining the Helm chart's input value .Values.additionalConfigs.edge (a dict).

additionalConfigs:
  edge:
    unknownTimeout: 1m
    scanInterval: 30s

janst · April 10, 2025, 4:40pm

thanks for the hint!

Just tested adding these options and repeated my test within AWS (EC2 instances with ziti-edge-tunneler & prohibit any traffic using NACL) and experienced the exact same behavior as before. Meaning it took ~10min until the hasEdgeRouterConnection property/bulb was false/grey, also when setting the options

    edge:
      identityStatusConfig:
        unknownTimeout: 1m
        scanInterval: 30s

qrkourier · April 10, 2025, 7:51pm

Can you confirm the Helm release was upgraded with the new values, just to ensure the template is placing the controller configuration directives in the expected location?

You reported your controller version is 1.4.3, and the new config directive was introduced in 1.2.0, so we just need to verify the controller is configured correctly.

kubectl get configmap ziti-controller-config --namespace=ziti --output=go-template='{{index .data "ziti-controller.yaml" }}'

or parse the YAML directly with yq

kubectl get configmap ziti-controller-config --namespace=ziti --output=go-template='{{index .data "ziti-controller.yaml" }}' \
| yq '.edge.identityStatusConfig'

janst · April 11, 2025, 6:35am

So, yes I am using helmChart version 1.2.2, here's the controller deployments' annotation:
helm.sh/chart: ziti-controller-1.2.2

And the validation of the directive placement:

k get configmap/openziti-controller-config --output=go-template='{{index .data "ziti-controller.yaml" }}' | yq e '.edge.identityStatusConfig' -

unknownTimeout: 1m
scanInterval: 30s

qrkourier · April 11, 2025, 2:47pm

paging @plorenz back in because we proved the controller is now configured with the new directives in edge.identityStatusConfig, but the symptoms persist, right @janst? I didn't fully comprehend the problem, but wanted to remove any friction associated with the Helm chart, at least.

plorenz · April 11, 2025, 3:56pm

@janst if your hypothesis is correct, which I'm guessing it is, then the solution is to add configuration to the router to disconnect SDKs which have been idle for longer than some threshold. I added an issue to track this: Add SDK idle threshold configuration to router · Issue #2994 · openziti/ziti · GitHub

janst · April 14, 2025, 8:21am

yes the symptoms persist also after setting the new directives you mentioned.

Topic		Replies	Views
How long does it take for the tunneler clients to retry?	4	228	September 3, 2023
Feature request and general feedback	0	21	June 13, 2025
Edge Tunnel Keep Alice Timeout zrok	2	23	December 12, 2024
macOS Desktop Edge doesn't connect to another router right away General Questions	3	31	August 26, 2024
Ziti client not reconnecting Ziti Desktop Edge for Windows	8	672	November 6, 2023

Ziti Network Client - Online Status

Related topics