Newer router versions not starting in my environment

Greetings!
I have recently run into an issue with new routers that I deploy to AWS with Terraform, and a custom AMI that I build on top of my company's foundational AMIs. First, let me say that I was able to work around it, by locking my version to 1.1.15, so I am not in a critical situation, but I would love to know what changes I need to make in my environment to continue running newer and newer versions as I deploy more routers. It looks to me like it is trying to make a directory on startup, and it fails with a FATAL error. It is unclear to me whether it is trying to run mkdir, and failing to find the mkdir command, or if it is trying to create a directory that may be null, because I see dir=. Is this a directory that I can create ahead of time in my AMI? Here is a snip from the logs from one FATAL error to the next:

Feb 26 01:20:02 ip-10-2-2-20 ziti[10986]: [4802.556]   FATAL ziti/ziti/router.run: {error=[mkdir : no such file or directory]} error starting
Feb 26 01:20:04 ip-10-2-2-20 ziti[10993]: [4804.976]    INFO ziti/ziti/router.run: {os=[linux] arch=[arm64] routerId=[Hi3lsWHMGw] version=[v1.3.3] revision=[2a62cc577e45] configFile=[/opt/openziti/er-use1-studios1-fsx1.yaml] go-version=[go1.23.4] build-date=[2025-01-27T19:27:43Z]} starting ziti router
Feb 26 01:20:04 ip-10-2-2-20 ziti[10993]: [4804.977]    INFO ziti/common/metrics.ConfigureGoroutinesPoolMetrics.GoroutinesPoolMetricsConfigF.func1.1: {maxWorkers=[32] idleTime=[30s] maxQueueSize=[1000] poolType=[pool.link.dialer] minWorkers=[0]} starting goroutine pool
Feb 26 01:20:04 ip-10-2-2-20 ziti[10993]: [4804.977]    INFO ziti/common/metrics.ConfigureGoroutinesPoolMetrics.GoroutinesPoolMetricsConfigF.func1.1: {maxWorkers=[1] poolType=[pool.rdm.handler] minWorkers=[1] idleTime=[30s] maxQueueSize=[1000]} starting goroutine pool
Feb 26 01:20:04 ip-10-2-2-20 ziti[10993]: [4804.977]    INFO ziti/common/metrics.ConfigureGoroutinesPoolMetrics.GoroutinesPoolMetricsConfigF.func1.1: {poolType=[pool.route.handler] maxQueueSize=[1000] minWorkers=[0] maxWorkers=[128] idleTime=[30s]} starting goroutine pool
Feb 26 01:20:04 ip-10-2-2-20 ziti[10993]: [4804.977]    INFO ziti/common/metrics.ConfigureGoroutinesPoolMetrics.GoroutinesPoolMetricsConfigF.func1.1: {minWorkers=[0] maxWorkers=[50] idleTime=[30s] poolType=[pool.terminator_validation] maxQueueSize=[1]} starting goroutine pool
Feb 26 01:20:04 ip-10-2-2-20 ziti[10993]: [4804.977]    INFO ziti/router/internal/edgerouter.(*Config).LoadConfigFromMap: cached data model file set to: /opt/openziti/er-use1-studios1-fsx1.yaml.proto.gzip
Feb 26 01:20:04 ip-10-2-2-20 ziti[10993]: [4804.977] WARNING ziti/router/internal/edgerouter.(*Config).LoadConfigFromMap: Invalid heartbeat interval [0] (min: 60, max: 10), setting to default [60]
Feb 26 01:20:04 ip-10-2-2-20 ziti[10993]: [4804.977]    INFO ziti/router/internal/edgerouter.(*Config).loadCsr: loaded csr info from configuration file at path [edge.csr]
Feb 26 01:20:04 ip-10-2-2-20 ziti[10993]: [4804.977]    INFO ziti/router/state.(*ManagerImpl).LoadRouterModel: router data model file does not exist [/opt/openziti/er-use1-studios1-fsx1.yaml.proto.gzip]
Feb 26 01:20:04 ip-10-2-2-20 ziti[10993]: [4804.977]    INFO ziti/router/forwarder.(*Scanner).run: started
Feb 26 01:20:04 ip-10-2-2-20 ziti[10993]: [4804.978]    INFO ziti/router/state.(*ManagerImpl).SetRouterDataModel: {index=[0]} replacing router data model
Feb 26 01:20:04 ip-10-2-2-20 ziti[10993]: [4804.978]    INFO ziti/router/state.(*ManagerImpl).SetRouterDataModel: {index=[0]} router data model replacement complete, old: 0x0, new: 0x4000237380
Feb 26 01:20:04 ip-10-2-2-20 ziti[10993]: [4804.978]   ERROR ziti/router.(*Router).Start: {dir=[] error=[mkdir : no such file or directory]} failed to initialize data directory
Feb 26 01:20:04 ip-10-2-2-20 ziti[10993]: [4804.978]   FATAL ziti/ziti/router.run: {error=[mkdir : no such file or directory]} error starting

Thanks!

Hi @greggw01, we will likely need @plorenz to weigh in on this one as he's been doing work in this area lately.

router data model file does not exist [/opt/openziti/er-use1-studios1-fsx1.yaml.proto.gzip]

That's one I haven't seen yet. I didn't see what version you were using. Can you describe what controller version you're on, whether the router is new or upgraded, and any other relevant information with respect to any versions or the steps you're taking?

Thanks

Hi @greggw01 , I think this will be fixed with 1.4, which we're hoping to put out in pre-release today. It looks like it's trying to create the directory in which the endpoints file is stored. It's unclear to me why this is failing, as if you don't have it set, it should default to the directory in which the router config is stored.

You can try setting it explicitly:

ctrl:
  dataDir: /path/to/dir

In 1.4, this is changing to

ctrl:
  endpointsFile: /path/to/endpoints.file

Because this is HA functionality, which is isn't GA yet, we're still fine-tuning config, events, etc.
OpenZiti 1.4 has a bunch of HA changes in those areas, as we're trying to get that all locked down for an HA RC in the near future.

I'll dig in a bit and see if I can figure out why it default your dataDir config setting to blank.

Let me know if this helped,
Paul

Hi Clint, I thought I sent this reply, but it was waiting here for me when I came back. Paul's response makes a lot of sense. For the time being, I am going to stick to version 1.1.15 for new deployments, and then after the changes are released, I will fix my environment to account for them, and maybe by then, the team that runs our controller will implement HA controller.

(The response I thought I sent.)
Our Security Platforms team recently upgraded the controller to 1.3.3. The router was a new deployment on 1.3.3. The first thing I tried was locking the version to 1.1.15, and that appears to have made the router group able to start.

1 Like