What is the `ziti-controller-init-container` doing?

Did it all again with a t4g using the exact same CF template from before. Started fine. Stopped fine. Edited the compose file and removed it... docker compose up -d again and it was fine... again.

The only thing that may be relevant is I see you mutate the compose file before the full docker stop is done???

I'll watch your new video

I am trying without doing this deliberately, and I am using a t2.micro now, this is on us-east-1 as well.

Thanks for the logs. I can see in them these lines

adding /var/openziti/ziti-bin to the path
controller initialized. unsetting ZITI_USER/ZITI_PWD from env
panic: controllerConfig must provide [db] or [raft]

That makes me think now that the file, when read, was incomplete... Definitely something goofy going on. Hopefully if you "let it settle" it'll be ok...

If that's the case, why does it go away when removing the access-control.whatever file? I can delete it and it starts. I can touch it and it crashes again.

Once the controller fails, I can restart it all I want, it will never come back up unless I rm the file, then I can restart it and all is right again.

The funny part. the ziti-controller is the first container that needs to start. I would expect you could replicate this problem with no other containers in the compose file at all. At no time does the controller read or reference the .init file the init container leaves behind, as far as I know. Quite honestly, I'm thinking this is some kind of bizzare docker bug. Why it's so easy for you to reproduce, while I can't seem to reproduce it at all though, baffles me to no end.

I really don't know how to diagnose it, since I cannot reproduce it. The error you hit though, that's a pretty clear indicator to me that the controller config file existed, and when it was read it returned an empty file (somehow), leading the controller to fail.

A sleep could maybe help if it was a race condition I suppose...

One of the things you can try, next time this happens... Start JUST the controller and run a bash prompt. From the bash prompt try to run the controller with the run-controller.sh and see if it failed. If you can get into an interactive terminal with the controller failing, I'd like to see the config file.

I don't know how to move forward here. :frowning:

Actually, it does, look at the entrypoint for the quickstart. The run-controller.sh checks for existence of the init file with an if not check for the file.

if [ ! -f "${ZITI_HOME}/access-control.init" ]; then
    setupEnvironment
    persistEnvironmentValues
  else
    echo "system has been initialized already. just starting the process"
  fi

What I don't understand is why it crashes instead of running the echo... in the else statement.

Ugh. I stand corrected! :slight_smile: I wouldn't have expect that. I'll have to ask @gberl002 why that is. It might be a leftover or it might be relevant to non-compose docker...

The crash isn't due to the echo from your logs, it was the controller just panic'ing due to a failure when reading the config file.

1 Like

Oh wait, i see it calling setupEnvironment. Maybe it's trying to mutate a file it shouldn't be... But, I don't see any mention of it in your logs. I don't see the "system has not been initialized" echo. I don't think it enters that block?

I think I might be able to deterministically reproduce it by launching a container and removing that file from the volume "outside" of the docker compose loop...

I'll keep poking at it to see if i can figure out why it impacts you and not me still :slight_smile:

1 Like

after looking at the overall state of the script, I do want to restructure it some to eliminate the need for the controller to look at the init script from another container. I think the only path forward is for us to slightly restructure the run-controller script to not do that and try again. Particularly since I cannot actually reproduce the issue.

This function call in that if block you were looking at is the one thing that gives me pause and makes me want to refactor this: createControllerConfig. This would overwrite that config file...

However, if that function gets called, you would see a bunch of output indicating as much. And we don't see evidence of that yet.

Best we can offer is to refactor the script, put out a new rev and "try again" I think...

That sounds good, I am satisfied I can at least reproduce it and I know now that the init container doesn't do anything I can't do after the environment is built. So for lack of further constructive help, I thank you and will go on to produce some more chaos elsewhere. :smiley:

1 Like

Hey @jptechnical, I found some issues with the run-router.sh script causing the router configs to be overwritten. I am putting up a PR with the fix right now so it should be resolved soon.

1 Like