Agent-registry.json is suddently empty

Could you please help with agent-registry.json file. Suddenly this file has lost his entries.
The only remaining entry is an unused entry. We have created this entry for testing and have forgotten to remove it. But all useful entries are disappeared. Hopefully we have found a backup and restored this file.

What is possible reason for a such strange behavior? It is possible that for a while zrok(v1.0.4) was unable to connect to zrok-controlle/ziti-controller. It is possible also that the user has transferred .zrok directory to a different pc and had two identical environment running. It is difficult to know what has happened.

How to avoid this happens again. Could we remove write permissions allowing only read access?

We'd need actual details to help with this in any meaningful way.

The agent registry is just a convenience to re-start existing reserved shares and private access instances. Both of those things can easily be re-started manually if something happens to the agent registry. Manually re-starting them in the agent will just recreate the agent registry entries.

Unfortunately the user (76) is unfamiliar with zrok.
The main concern is the fact that the entries have been disappeared.
Can we set read only permission on this file? This way he can simply restart his pc to get everything work again.

You can try it... but that's not how it's designed to work.

This is the first time anyone has reported an issue with the agent registry. It's a very simple facility... it's already designed to restart reserved shares and private accesses when the agent is restarted.

If an entry is lost for some reason, simply just zrok share reserved or zrok access private again, and it will resume restarting those things for you automatically.

I understand. It's unrealistic for some of us who is above 76.
So someone has to drive 300miles to simply restore the file.

You have your own instance, yes? Turn on agent remoting and remotely control his agent.

Yes. I will do this. Thank you.

Let us know if you run into anything setting it up. The docs on that are here:

It's being used successfully in a couple of private zrok environments.

On the share side we have found the following error:

zrok[1987]: {"file":"/__w/zrok/zrok/agent/agent.go:132","func":"github.com/openziti/zrok/agent.(*Agent).ReloadRegistry","level":"error","msg":"error restarting private access '\u0026{share_name localhost:port false  0 0 []}': unable to start access: Post "https://zrok.controller.domain.name:port/api/v1/access": context deadline exceeded","time":"2025-07-12T22:48:52.981Z"}

It is possible that ISP is too slow. Some requests take up to 5-8 secs. Majority is about 1 sec or under.

What is allowed timeout for the requests?

Agent Remoting works like a charm. Thank you. :grinning_face:

1 Like

The timeout for most requests is currently 30 seconds.

There is an issue open to figure out some solution to keep the registry intact in the face of errors. Will be getting to it in a week or two.

1 Like

I have noticed a curious thing: after sending kill to zrok access the corresponding binding has been removed from agent-registry.json

For some reasons linux may want to send kill to a process. In a such situation the corresponding frontend token will be removed.

The tokens are still disappearing in zrok 1.0.8

curl -s -H "X-TOKEN: secret" -XPOST -H "Content-Type: application/zrok.v1+json" -d '{"envZId": "envId"}' https://${ZROK_CTRL}/api/v1/agent/status

{"accesses":null,"shares":null}

Yes. Nothing has been changed yet with regard to this issue. When your shares or accesses encounter errors, they will end up being removed. There is an open issue for it that will get worked on as soon as we can get to it. When it’s fixed there will be an entry in the CHANGELOG.md about it.

It ends up not being a simple change. Requires the agent being able to manage processes in errored states.

Usually I restart my shares with a script, it stops all shares and starts them.

You mean the tokens will be lost at this point.

Thus after restarting the shares I need to check all connected devices.

Unfortunately after a check I need to restart zrok controller to kill agent remoting api-sessions. Otherwise the ziti controller will die.

The zrok agent only stores reserved shares (or private accesses) in the agent registry (in order to restart them). In neither case is a “token lost”. A reserved share still exists. And a private access is always ephemeral and it’s existence has nothing to do with the lifecycle of the share it’s attached to.

Not sure what you’re doing, but the issue I’m talking about is only relevant in the case where you’re starting an agent, it has shares or accesses listed in its agent registry, and when it goes to start those back up, they error out. In that case, they will be removed from the registry.

If you’re describing something else, I’m not sure I understand.

I am talking about reserved shares.

You say that the life cycles of an access and a share are different. So there is no any relation.

It is possible that at the time user has started its pc there was no internet yet. The network cable was unplugged. They do it. They do not care about the order: unplug the cable , then switch off pc/ turn on pc then connect the cable. Any order is possible.

Another user never turns off its pc. But he can have network connectivity issues.

I am talking about disappearing frontend tokens not the reserved ones.

I remember that I restarted the zrok-controller and after that I have checked the agent status. The output was empty = there are no frontend tokens.

{"accesses":null,"shares":null}

In the current situation I do not know what is the cause of this behavior. The frontend tokens continue to disappear.

To check that everything works as it should I have written a small script. But zrok-controller does not close agent-remoting api-sessions. Thus the number of api-sessions is always increasing because of my regular monitoring of the agent status on all pc. Sadly a large number of agent-remoting api-sessions just kill the ziti-controller.

Instead of regularly querying the agent status via api/v1/agent/status It might be possible to look at the identityId and service.name using ziti edge list sessions. There is might be a relation between zrok’s frontend token and ziti session?

The idea is to detect somehow that a frontent token has been lost and to recreate the missing binding via agent-remoting, api/v1/agent/access.

The persistence of frontends and reserved shares works exactly the same in the zrok agent. If the zrok access fails to start when the agent starts, it will be removed from the agent registry. It’s the same issue I described with reserved shares.

There is a change coming to agent remoting that should improve the api session behavior. It will be released in v1.1.

This is a scary mystery. I was always able to recreate the frontend tokens using agent-remoting/curl.

Thank you for this advice.

So far user’s agent is always available - agent is able to create a Bind.

However other frontend tokens (Dial) can disappear.

I don’t think this is scary nor is it much of a mystery.

Yes, you can re-create the frontends. Private frontends will end up with a new frontendToken each time.