livenessProbe for ziti-edge-tunnel et al

I had a DNS issue at one k8s node, and found out that the ziti-edge-tunnel running as Daemonset (ziti-edge-tunnel) was reporting errors to the logs which was OK because the ztAPI Address was not resolved at all.
But!, the pod never noticed it at all, and it would be more noticeable for me and everyone if we would have a livenessProbe in place.

Today I was looking for a kind-of tunnel status report from the ziti-edge-tunnel and found the following,

  1. ziti-edge-tunnel tunnel_status is not working locally. I guess it tries to open some configuration file which may be hardcode somewhere. See,
# ps -ef | grep ziti
root           8       1  0 Feb19 ?        00:04:41 ziti-edge-tunnel run --identity /ziti-edge-tunnel/ziti-edge-tunnel-identity.json
root         137     117  0 14:00 pts/0    00:00:00 grep --color=auto ziti
# ziti-edge-tunnel tunnel_status -h
ziti-edge-tunnel tunnel_status: Get Tunnel Status
usage: ziti-edge-tunnel tunnel_status
# ziti-edge-tunnel tunnel_status
failed to connect: -2/no such file or directory
  1. There is no local Webserver open where the Tunneler would/could expose a /health endpoint. IMHO, this would be the more cloud native approach, and the one to follow.

  2. In general, all helm-charts are missing readiness/liveness Probes, which would improve the way ziti runs on k8s.

Looking forward for your input on how to solve/implement this. Thank you!

Mario

1 Like

Hello,

The tunnel_status subcommand opens a domain socket in /tmp/.ziti/. I could imagine exposing this subcommand and perhaps other functionality through an http interface, but we obviously don't have that today.

I'm not very handy with kubernetes, but If I'm understanding the liveness documentation then it seems like you could use ziti-edge-tunnel tunnel_status (perhaps with pipe to jq) as a liveness command in your zet container. The liveness command would run in the zet container and inform the kubelet of the health status.

I'd think you could alternatively use dig as a liveness command to request one of your ziti hostnames from zet's DNS server IP.

2 Likes

I like both of those ideas! Toward that end, here's an example of parsing the output with sed and jq: tunnel_status output is not valid JSON · Issue #494 · openziti/ziti-tunnel-sdk-c · GitHub

2 Likes

Hi @scareything,

Thank you for your input. The solution with a local Webserver would work with the current containers as they are right now.

As I said, ziti-edge-tunnel tunnel_status is not working locally with the current image (openziti/ziti-edge-tunnel). And about dig, yes, it is a fair point but this tool is also not present on the image. OTOH, this approach would be very deployment related, as not all tunnels get the same configuration.

2 Likes

Ahh, sorry I read your post too quickly. As a security precaution, ziti-edge-tunnel only creates the domain sockets if it can set the group of the socket directory (/tmp/.ziti) to the ziti group. So you're probably seeing these warnings when ziti-edge-tunnel starts:

WARN ziti-edge-tunnel:ziti-edge-tunnel.c:1608 make_socket_path() local 'ziti' group not found.
WARN ziti-edge-tunnel:ziti-edge-tunnel.c:1609 make_socket_path() please create the 'ziti' group by running these commands:
WARN ziti-edge-tunnel:ziti-edge-tunnel.c:1611 make_socket_path() sudo groupadd --system ziti
WARN ziti-edge-tunnel:ziti-edge-tunnel.c:1612 make_socket_path() users can then be added to the 'ziti' group with:
WARN ziti-edge-tunnel:ziti-edge-tunnel.c:1613 make_socket_path() sudo usermod --append --groups ziti <USER>
WARN ziti-edge-tunnel:ziti-edge-tunnel.c:1712 run_tunneler_loop() One or more socket servers did not properly start.

The ziti group does not currently exist in the ziti-edge-tunnel image, but maybe it should? I'm not sure if adding it in the Dockerfile is the right way, or if there's some way to expose host groups to the container. If memory serves, this is something that @qrkourier would have better opinions on.

1 Like

Sure! We can add POSIX group ziti to these container image: openziti/ziti-edge-tunnel (runs ziti-edge-tunnel run) and openziti/ziti-host (runs ziti-edge-tunnel run-host).

Here's a pull request: add group ziti to container images by qrkourier · Pull Request #804 · openziti/ziti-tunnel-sdk-c · GitHub

I tested with this image: docker.io/kbinghamnetfoundry/ziti-edge-tunnel:0.22.21-groupadd-ziti (link to tag in Docker Hub).

2 Likes

Ok, the image now has a ziti group which allows ziti-edge-tunnel to set up the domain sockets for running external commands like tunnel_status. You can re-pull the :latest image or use the :0.22.22 tag to get the updated image. With this image you should be able to use tunnel_status as a liveness command. Let me know how it goes for you!

Hi!

The new image works for me as well! Thank you for that!

I pushed a PR for configuring the livenessProbe for the ziti-edge-tunnel helm chart, let me know if you agree with that. ziti-edge-tunnel: Add livenessProbe by mjtrangoni · Pull Request #171 · openziti/helm-charts · GitHub

Now that I could watch the output, there should be metrics over there, but I found them buggy, see

"Metrics":{"Up":0,"Down":0}

What should we expect over here?

Best regards,

Mario

1 Like

Well, you should expect those up/down rates to move around as your ziti connection is used. But there's a bug in the ziti sdk used by ziti-edge-tunnel that prevents the rates from being updated.

The fix for the bug was pretty easy so I went ahead and made the fix. Please give us a couple days to bring this fix into the next release of ziti-edge-tunnel.