I’m trying to set up Agent Remoting in my self-hosted Kubernetes deployment of zrok, but can’t seem to get it working.
The agent_controller identity creation works as expected, and I’ve set up the controller config (ctrl.yaml) with the required configuration:
agent_controller:
z_id: my_z_id
identity_path: /var/lib/.zrok/identities/agentremoting.json
But strangely, when I try to restart my controller deployment the above identity is being deleted in my OpenZiti network. My suspicion is there is some pre-delete-hook that triggers when the pod gets deleted and recreated during the restart. Is there a recommended way to restart the controller without triggering any cleanup?
For reference, here are the commands I’ve been using:
# Install zrok chart
helm install zrok .
# Create remote-agent identity
export ZROK_ADMIN_TOKEN=$(kubectl get secret zrok-admin-secret -n zrok -o jsonpath='{.data.admin-token}' | base64 -d) && zrok admin create identity remote-agent
# Create zrok-agent-identity secret
kubectl create secret generic zrok-agent-identity -n zrok \
--from-file=agentremoting.json=remote-agent.json
# Manually update values.yaml with z_id then upgrade
helm upgrade zrok .
# Restart zrok controler
kubectl rollout restart deploy zrok
Note: I’ve configured the chart to pull the agentremoting.json file from a secret.
Thanks!
After some further investigation, I think I’ve got closer to a diagnosis.
In the zrok helm chart, we have a bootstrap-ziti.bash script that runs zrok admin bootstrap /etc/zrok/ctrl.yaml and if this command fails for any reason, then it will run zrok admin unbootstrap /etc/zrok/ctrl.yaml && exit 1. When we restart the zrok controller pod, the bootstrap init container fails on first attempt because the ‘public’ identity already exists:
/zrok/controller/bootstrap.go:47","func":"github.com/openziti/zrok/controller.Bootstrap","level":"info","msg":"creating identity for public frontend access","time":"2025-11-03T17:03:00.831Z"}
panic: error creating 'public' identity: error for request 0fm9bd2Y0: COULD_NOT_VALIDATE: The supplied request contains an invalid document or no valid accept content were available, see cause, caused by: error in field name with value public: duplicate value 'public' in unique index on identities store
This then invokes unbootstrap to delete the zrok ‘public’ identity, but I suspect it is also deleting the agent_controller identity.
I’m assuming I can just go and delete the ‘public’ identity manually before restarting the zrok controller and that might work…
The bootstrapping and such you’re referring to is unique to he containerized deployment of zrok. @qrkourier is probably the right person to discuss the containerization, as he built it.
It’s probably not as simple as deleting the public identity… if you have a public frontend running (zrok access public), deleting the public identity will very likely break that frontend.
The container configuration probably needs some way to defeat the bootstrapping (to disable it, once it’s been done). The zrok controller itself has no sort of “cleanup hook” of any kind.
Not sure why zrok admin bootstrap is trying to create a public identity if it already exists. But you might add --skip-frontend to the zrok admin bootstrap flags, and that will stop it from trying to inspect the public identity at all.
Zero idea why your agent remoting identity is getting removed. The only command that would remove that identity is zrok admin unbootstrap. Again, I don’t know how the containerization is working here, but you might want to make sure that unbootstrap is not being invoked.
Deleting the ‘public’ identity manually before restarting the controller pod did resolve the issue, but I’m now encountering an authentication issue after running zrok agent enroll and restarting my agent:
> zrok agent start
[ 0.004] INFO zrok/agent.(*Agent).Run: started
[ 0.005] ERROR zrok/agent.(*Agent).Run: error reloading registry 'open /home/sdundas/.zrok/agent-registry.json: no such file or directory'
[ 0.005] INFO zrok/agent.(*Agent).gateway: started
[ 0.006] INFO zrok/agent.(*Agent).manager: started
[ 0.006] INFO zrok/agent.(*Agent).remoteAgent: listening for remote commands at 'CNwr1gGzMnD9'
[ 0.625] ERROR zrok/agent.(*Agent).remoteAgent: error listening for remote agent: error creating listener: failed to listen: no apiSession, authentication attempt failed: error for request MewvMd2Y9R: UNAUTHORIZED: The request could not be completed. The session is not authorized or the credentials are invalid, caused by: error for request : UNHANDLED: UNAUTHORIZED: The request could not be completed. The session is not authorized or the credentials are invalid
Maybe @qrkourier can shed some light regarding the containerization, but my understand of the chart works as follows. There is a bootstrap-ziti.bash script configured by controller-secrets-configmap.yaml and this bootstrap script is ran by the controller init container. The controller bootstrap script has the following:
zrok admin bootstrap /etc/zrok/ctrl.yaml || {
zrok admin unbootstrap /etc/zrok/ctrl.yaml
exit 1
}
When you restart the controller pod using kubectl rollout restart deploy zrok you get this error:
/zrok/controller/bootstrap.go:47","func":"github.com/openziti/zrok/controller.Bootstrap","level":"info","msg":"creating identity for public frontend access","time":"2025-11-03T17:03:00.831Z"}
panic: error creating 'public' identity: error for request 0fm9bd2Y0: COULD_NOT_VALIDATE: The supplied request contains an invalid document or no valid accept content were available, see cause, caused by: error in field name with value public: duplicate value 'public' in unique index on identities store
Which causes the bootstrap script to execute zrok admin unbootstrap /etc/zrok/ctrl.yaml, deleting the ‘public’ and ‘remote-agent’ identity. Kubernetes then restarts the failed pod (exit 1) and this time the zrok admin bootstrap command runs fine, creating the ‘public’ identity, but doesn’t create the ‘remote-agent’ identity.
I’ll try adding --skip-frontend to the controller bootstrap script to see if that helps. I’m assuming the ‘public’ identity will be handled by the frontend bootstrap…?