Unable to run HA Controller as a service

Hey All,

I've been attempting to set up some VMs for testing the HA controller functionality in OpenZiti. I have a reasonably automatable process going to generate a config file with the settings as desired, and I'm able to at least start up a controller cluster.

However, my problem lies in making the controller application as a service, like can be done after bootstrapping a single controller. My current setup relies on being run manually in a scratch folder. If I run the command line ziti controller run ./config.yml , then everything boots up as expected. Obviously though this means my controller does not stay alive after disconnecting my SSH session.

Firstly, I tried running the bootstrap process from the Controller Deployment Page (link). I had hoped this might pick up on the environment variables I had set to generate my controller configuration, but it just seemed to power ahead and create a default single-controller setup.

Next, I tried following the instructions to migrate an existing setup. Similarly, I followed the instructions from the documentation (link), excluding in Step 4, where I copied the raft folder instead of db.

However, on running the command to enable the service, it fails to start. Examining the logs shows the following error:

Jul 23 14:58:26 ha-controller-1 systemd[1]: ziti-controller.service: Scheduled restart job, restart counter is at 411.
Jul 23 14:58:26 ha-controller-1 systemd[1]: Starting ziti-controller.service - OpenZiti Controller...
Jul 23 14:58:26 ha-controller-1 entrypoint.bash[9683]: realpath: missing operand
Jul 23 14:58:26 ha-controller-1 entrypoint.bash[9683]: Try 'realpath --help' for more information.
Jul 23 14:58:26 ha-controller-1 entrypoint.bash[9687]: realpath: missing operand
Jul 23 14:58:26 ha-controller-1 entrypoint.bash[9687]: Try 'realpath --help' for more information.
Jul 23 14:58:26 ha-controller-1 entrypoint.bash[9676]: ERROR: database file '' is not writable
Jul 23 14:58:26 ha-controller-1 entrypoint.bash[9676]: Provide a configuration in '/var/lib/private/ziti-controller' or generate with:
Jul 23 14:58:26 ha-controller-1 entrypoint.bash[9676]: * Set vars in'/opt/openziti/etc/controller/bootstrap.env'
Jul 23 14:58:26 ha-controller-1 entrypoint.bash[9676]: * Run '/opt/openziti/etc/controller/bootstrap.bash'
Jul 23 14:58:26 ha-controller-1 entrypoint.bash[9676]: * Run 'systemctl enable --now ziti-controller.service'
Jul 23 14:58:26 ha-controller-1 entrypoint.bash[9676]: WARN: set VERBOSE=1 or DEBUG=1 for more output
Jul 23 14:58:26 ha-controller-1 systemd[1]: ziti-controller.service: Control process exited, code=exited, status=1/FAILURE
Jul 23 14:58:26 ha-controller-1 systemd[1]: ziti-controller.service: Failed with result 'exit-code'.
Jul 23 14:58:26 ha-controller-1 systemd[1]: Failed to start ziti-controller.service - OpenZiti Controller.

I have checked, and my config file is definitely in the location which the service seems to be asking for:

$ sudo ls -la /var/lib/private/ziti-controller
total 28
drwxr-xr-x 4 64093 64093  4096 Jul 23 15:02 .
drwx------ 3 root  root   4096 Jul 23 14:36 ..
-rw-r--r-- 1 64093 64093 10790 Jul 23 14:35 config.yml
drwxr-xr-x 4 root  root   4096 Jul 23 15:02 pki
drwx------ 3 root  root   4096 Jul 23 15:02 raft

I struggled to get my head around the documentation for HA so I'm not ruling out the chance I've overlooked something simple. Any advice or ideas would be greatly appreciated.

Thanks!

Why not you share the configuration of your controller service?
My working service example (Note: workingdirectory should be your root folder of your raft and controller config.yaml)

nano /etc/systemd/system/ziti-controller.service

[Unit]
Description=OpenZiti Controller
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/ziti controller run /etc/ziti-ha/controller.yaml
Restart=always
RestartSec=5
User=root
WorkingDirectory=/etc/ziti-ha
StandardOutput=append:/var/log/ziti-controller.log
StandardError=append:/var/log/ziti-controller.log

[Install]
WantedBy=multi-user.target

If you can the command manually probably your service file is incorrectly set.

worst case you can run ziti controller run ./config.yml &
this will run it in background process until next reboot

1 Like

Took me a second to find it, as the path is different under Ubuntu.

This is the default file added presumably from installing the package - I've not changed this at all.
I have some extra options compared to yours, but nothing here looks out of the ordinary to me.

Specifically, the working directory is the same one that I copied all the files to while following the docs.

[Unit]
Description=OpenZiti Controller
After=network-online.target

[Service]
Type=simple

# manage the user and permissions for the service automatically
DynamicUser=yes

# this env file configures the service, including whether or not to perform bootstrapping
EnvironmentFile=/opt/openziti/etc/controller/service.env

# relative to /var/lib
StateDirectory=ziti-controller
WorkingDirectory=/var/lib/ziti-controller
ReadOnlyPaths=/opt/openziti/share/console

ExecStartPre=/opt/openziti/etc/controller/entrypoint.bash check config.yml
ExecStart=/opt/openziti/bin/ziti controller run config.yml ${ZITI_ARGS}

Restart=always
RestartSec=3

LimitNOFILE=65535
UMask=0007

[Install]

I suggest you change the user from dynamic user to root or ziti with proper permission to ziti folder.

Which user you used when execute ziti run manually?

1 Like

An update on my efforts to resolve this:

I made some changes to my service file, firstly setting the service to run only as root.

Other similar threads suggested using absolute paths for everything, so I did so.

And also creating a new directory under /etc/ziti to get away from the weird folder link stuff that is going on in the default directory.

My service file now looks like this:

[Unit]
Description=OpenZiti Controller
After=network-online.target

[Service]
Type=simple

User=root

# this env file configures the service, including whether or not to perform bootstrapping
EnvironmentFile=/opt/openziti/etc/controller/service.env

# relative to /var/lib
StateDirectory=/etc/ziti
WorkingDirectory=/etc/ziti
ReadOnlyPaths=/opt/openziti/share/console

ExecStartPre=/opt/openziti/etc/controller/entrypoint.bash check /etc/ziti/config.yml
ExecStart=/opt/openziti/bin/ziti controller run /etc/ziti/config.yml ${ZITI_ARGS}

Restart=always
RestartSec=3

LimitNOFILE=65535
UMask=0007

[Install]
WantedBy=multi-user.target

I’ve checked and the working directory has appropriate access permissions as well, don’t believe it could be that.

Nonetheless I get the exact same error:

root@ha-controller-1:/etc/ziti# systemctl enable --now ziti-controller.service
Job for ziti-controller.service failed because the control process exited with error code.
See "systemctl status ziti-controller.service" and "journalctl -xeu ziti-controller.service" for details.
root@ha-controller-1:/etc/ziti# journalctl -xeu ziti-controller.service

--snip--

Sep 24 10:54:52 ha-controller-1 entrypoint.bash[22171]: realpath: missing operand
Sep 24 10:54:52 ha-controller-1 entrypoint.bash[22171]: Try 'realpath --help' for more information.
Sep 24 10:54:52 ha-controller-1 entrypoint.bash[22175]: realpath: missing operand
Sep 24 10:54:52 ha-controller-1 entrypoint.bash[22175]: Try 'realpath --help' for more information.
Sep 24 10:54:52 ha-controller-1 entrypoint.bash[22164]: ERROR: database file '' is not writable
Sep 24 10:54:52 ha-controller-1 entrypoint.bash[22164]: Provide a configuration in '/etc/ziti' or gene>
Sep 24 10:54:52 ha-controller-1 entrypoint.bash[22164]: * Set vars in'/opt/openziti/etc/controller/boo>
Sep 24 10:54:52 ha-controller-1 entrypoint.bash[22164]: * Run '/opt/openziti/etc/controller/bootstrap.>
Sep 24 10:54:52 ha-controller-1 entrypoint.bash[22164]: * Run 'systemctl enable --now ziti-controller.>
Sep 24 10:54:52 ha-controller-1 entrypoint.bash[22164]: WARN: set VERBOSE=1 or DEBUG=1 for more output
Sep 24 10:54:52 ha-controller-1 systemd[1]: ziti-controller.service: Control process exited, code=exit>

Previously, I was manually running ziti as a normal user. But this time to rule out any differences I was able to run it just fine as root as well.

root@ha-controller-1:/etc/ziti# ziti controller run ./config.yml

--snip--

[   3.466] WARNING ziti/controller/server.(*Controller).checkEdgeInitialized: the controller has not been initialized, no default admin exists. Add this node to a cluster using 'ziti agent cluster add tls:ha-controller-1.ziti.test:6262' against an existing cluster member, or if this is the bootstrap node, run 'ziti agent cluster init' to configure the default admin and bootstrap the cluster

For completeness, my controller configuration file currently looks like this:

I generated it directly by sourcing environment variables then running. Only thing I have changed from that is the Console file location at the very end. ziti create config controller --clustered

v: 3

#trace:
#  path: "ha-controller-1.ziti.test.trace"

#profile:
#  memory:
#    path: ctrl.memprof



cluster:
  dataDir:         "/etc/ziti/raft"


identity:
  cert:        "/etc/ziti/pki/ha-controller-1/certs/client.chain.pem"
  server_cert: "/etc/ziti/pki/ha-controller-1/certs/server.chain.pem"
  key:         "/etc/ziti/pki/ha-controller-1/keys/server.key"
  ca:          "/etc/ziti/pki/ha-controller-1/certs/ha-controller-1.chain.pem"
  #alt_server_certs:
  #  - server_cert:  ""
  #    server_key:   ""

# trust domains may be overridden by SPIFFE ID as URI SAN
#trustDomain: ziti.example.com

# additional trust domains allow for migrating to a new trust domain
#additionalTrustDomains: []

# Network Configuration
#
# Configure how the controller will establish and manage the overlay network, and routing operations on top of
# the network.
#
#network:

  # routeTimeoutSeconds controls the number of seconds the controller will wait for a route attempt to succeed.
  #routeTimeoutSeconds:  10

  # createCircuitRetries controls the number of retries that will be attempted to create a path (and terminate it)
  # for new circuits.
  #createCircuitRetries: 2

  # pendingLinkTimeoutSeconds controls how long we'll wait before creating a new link between routers where
  # there isn't an established link, but a link request has been sent
  #pendingLinkTimeoutSeconds: 10

  # Defines the period that the controller re-evaluates the performance of all of the circuits
  # running on the network.
  #
  #cycleSeconds:         15

  # Sets router minimum cost. Defaults to 10
  #minRouterCost: 10

  # Sets how often a new control channel connection can take over for a router with an existing control channel connection
  # Defaults to 1 minute
  #routerConnectChurnLimit: 1m

  # Sets the latency of link when it's first created. Will be overwritten as soon as latency from the link is actually
  # reported from the routers. Defaults to 65 seconds.
  #initialLinkLatency: 65s

  #smart:
    #
    # Defines the fractional upper limit of underperforming circuits that are candidates to be re-routed. If
    # smart routing detects 100 circuits that are underperforming, and `smart.rerouteFraction` is set to `0.02`,
    # then the upper limit of circuits that will be re-routed in this `cycleSeconds` period will be limited to
    # 2 (2% of 100).
    #
    #rerouteFraction:    0.02
    #
    # Defines the hard upper limit of underperforming circuits that are candidates to be re-routed. If smart
    # routing detects 100 circuits that are underperforming, and `smart.rerouteCap` is set to `1`, and
    # `smart.rerouteFraction` is set to `0.02`, then the upper limit of circuits that will be re-routed in this
    # `cycleSeconds` period will be limited to 1.
    #
    #rerouteCap:         4

# the endpoint that routers will connect to the controller over.
ctrl:
  options:
    advertiseAddress: tls:ha-controller-1.ziti.test:6262
  # (optional) settings
  # set the maximum number of connect requests that are buffered and waiting to be acknowledged (1 to 5000, default 1)
  #maxQueuedConnects:      1
  # the maximum number of connects that have  begun hello synchronization (1 to 1000, default 16)
  #maxOutstandingConnects: 16
  # the number of milliseconds to wait before a hello synchronization fails and closes the connection (30ms to 60000ms, default: 5000ms)
  #connectTimeoutMs:       5000
  listener:             tls:0.0.0.0:6262

#metrics:
#  influxdb:
#    url:                http://localhost:8086
#    database:           ziti

# xctrl_example
#
#example:
#  enabled:              false
#  delay:                5s

healthChecks:
  boltCheck:
    # How often to try entering a bolt read tx. Defaults to 30 seconds
    interval: 30s
    # When to time out the check. Defaults to 20 seconds
    timeout: 20s
    # How long to wait before starting the check. Defaults to 30 seconds
    initialDelay: 30s

# By having an 'edge' section defined, the ziti-controller will attempt to parse the edge configuration. Removing this
# section, commenting out, or altering the name of the section will cause the edge to not run.
edge:
  # This section represents the configuration of the Edge API that is served over HTTPS
  api:
    #(optional, default 90s) Alters how frequently heartbeat and last activity values are persisted
    # activityUpdateInterval: 90s
    #(optional, default 250) The number of API Sessions updated for last activity per transaction
    # activityUpdateBatchSize: 250
    # sessionTimeout - optional, default 30m
    # The number of minutes before an Edge API session will time out. Timeouts are reset by
    # API requests and connections that are maintained to Edge Routers
    sessionTimeout: 30m
    # address - required
    # The default address (host:port) to use for enrollment for the Client API. This value must match one of the addresses
    # defined in this Controller.WebListener.'s bindPoints.
    address: ha-controller-1.ziti.test:1280
  # This section is used to define option that are used during enrollment of Edge Routers, Ziti Edge Identities.
  enrollment:
    # signingCert - required
    # A Ziti Identity configuration section that specifically makes use of the cert and key fields to define
    # a signing certificate from the PKI that the Ziti environment is using to sign certificates. The signingCert.cert
    # will be added to the /.well-known CA store that is used to bootstrap trust with the Ziti Controller.
    signingCert:
      cert: /etc/ziti/pki/ha-controller-1/certs/ha-controller-1.cert
      key:  /etc/ziti/pki/ha-controller-1/keys/ha-controller-1.key
    # edgeIdentity - optional
    # A section for identity enrollment specific settings
    edgeIdentity:
      # duration - optional, default 180m
      # The length of time that a Ziti Edge Identity enrollment should remain valid. After
      # this duration, the enrollment will expire and no longer be usable.
      duration: 180m
    # edgeRouter - Optional
    # A section for edge router enrollment specific settings.
    edgeRouter:
      # duration - optional, default 180m
      # The length of time that a Ziti Edge Router enrollment should remain valid. After
      # this duration, the enrollment will expire and no longer be usable.
      duration: 180m

# web
# Defines webListeners that will be hosted by the controller. Each webListener can host many APIs and be bound to many
# bind points.
web:
  # name - required
  # Provides a name for this listener, used for logging output. Not required to be unique, but is highly suggested.
  - name: client-management
    # bindPoints - required
    # One or more bind points are required. A bind point specifies an interface (interface:port string) that defines
    # where on the host machine the webListener will listen and the address (host:port) that should be used to
    # publicly address the webListener(i.e. mydomain.com, localhost, 127.0.0.1). This public address may be used for
    # incoming address resolution as well as used in responses in the API.
    bindPoints:
      #interface - required
      # A host:port string on which network interface to listen on. 0.0.0.0 will listen on all interfaces
      - interface: ha-controller-1.ziti.test:1280
        # address - required
        # The public address that external incoming requests will be able to resolve. Used in request processing and
        # response content that requires full host:port/path addresses.
        address: ha-controller-1.ziti.test:1280
    # identity - optional
    # Allows the webListener to have a specific identity instead of defaulting to the root 'identity' section.
    identity:
      ca:          "/etc/ziti/pki/ha-controller-1/certs/ha-controller-1.chain.pem"
      key:         "/etc/ziti/pki/ha-controller-1/keys/server.key"
      server_cert: "/etc/ziti/pki/ha-controller-1/certs/server.chain.pem"
      cert:        "/etc/ziti/pki/ha-controller-1/certs/client.chain.pem"
      #alt_server_certs:
      #- server_cert: ""
      #  server_key:  ""

    # options - optional
    # Allows the specification of webListener level options - mainly dealing with HTTP/TLS settings. These options are
    # used for all http servers started by the current webListener.
    options:
      # idleTimeoutMs - optional, default 5000ms
      # The maximum amount of idle time in milliseconds allowed for pipelined HTTP requests. Setting this too high
      # can cause resources on the host to be consumed as clients remain connected and idle. Lowering this value
      # will cause clients to reconnect on subsequent HTTPs requests.
      idleTimeout: 5000ms  #http timeouts, new
      # readTimeoutMs - optional, default 5000ms
      # The maximum amount of time in milliseconds http servers will wait to read the first incoming requests. A higher
      # value risks consuming resources on the host with clients that are acting bad faith or suffering from high latency
      # or packet loss. A lower value can risk losing connections to high latency/packet loss clients.
      readTimeout: 5000ms
      # writeTimeoutMs - optional, default 100000ms
      # The total maximum time in milliseconds that the http server will wait for a single requests to be received and
      # responded too. A higher value can allow long-running requests to consume resources on the host. A lower value
      # can risk ending requests before the server has a chance to respond.
      writeTimeout: 100000ms
      # minTLSVersion - optional, default TLS1.2
      # The minimum version of TSL to support
      minTLSVersion: TLS1.2
      # maxTLSVersion - optional, default TLS1.3
      # The maximum version of TSL to support
      maxTLSVersion: TLS1.3
    # apis - required
    # Allows one or more APIs to be bound to this webListener
    apis:
      # binding - required
      # Specifies an API to bind to this webListener. Built-in APIs are
      #   - edge-management
      #   - edge-client
      #   - fabric-management
      - binding: edge-management
        # options - arg optional/required
        # This section is used to define values that are specified by the API they are associated with.
        # These settings are per API. The example below is for the 'edge-api' and contains both optional values and
        # required values.
        options: { }
      - binding: edge-client
        options: { }
      - binding: fabric
        options: { }
      - binding: edge-oidc
        options: { }
      - binding: zac
        options:
          location: /opt/openziti/share/console
          indexFile: index.html

I feel as if I’m missing something totally obvious in the docs or in my steps, If anyone has any ideas they would be greatly appreciated.

Working some more on this, it seemed the service was failing to start due to the “entrypoint” check script not accessing the database file. I tried disabling bootstrapping through service.env, but nothing I did in that file had any effect at all.

Eventually I just decided to comment out the ExecStartPre line from my service file. Since I’m not doing any bootstrapping anyway I can’t see the harm - all it was doing was just failing to read a variable properly and killing my whole service. No idea if this is the intended path or not - but now it works which is all I cared about.