Controller stuck on bootup

Hello, and Merry Christmas
I'm new to OpenZiti. The project sounds quite interesting and i'm trying to setup a network for my homelab.
I'm fallowing the following guides:
Setting Up Oracle Cloud To Host OpenZiti (i tried latest ubuntu both normal and minimal images)
Host OpenZiti Anywhere | OpenZiti
I got stuck on the expressInstall on the waiting for the controller to come online to allow the edge router to enroll(i'm getting this up only with external ip so i'm skipping export EXTERNAL_DNS="....")
i did some reading and debugging, i found this forum post that seems to be quite similar to my problem. The main issue is that for CQDet2803 the problem magically disappeared :expressionless:
when i run ziti ops verify-network --controller-config-file $HOME/.ziti/quickstart/$(hostname)/$(hostname).yaml
i get

INFO    Verifying controller config: /home/ubuntu/.ziti/quickstart/instance-20241225-1343/instance-20241225-1343.yaml
ERROR   controller advertise address at <my_public_ip>:8440 cannot be reached.
INFO    verifying 1 web entries
INFO    verifying 1 web bindPoints
ERROR   web entry[client-management], bindPoint[0] address at <my_public_ip>:8441 cannot be reached.

ERROR   One or more error. Review the output above for errors.

running netstat -ano | grep 844 | grep LIST
i get

tcp6       0      0 :::8441                 :::*                    LISTEN      off (0.00/0/0)
tcp6       0      0 :::8440                 :::*                    LISTEN      off (0.00/0/0)

i assume it's an IPv4vsIPv6 issue, and i hope there is a way to force ziti to use v4, but thru my googling i couldn't fin anything
i checked the $HOME/.ziti/quickstart/$(hostname)/$(hostname).yaml file as i assume this is the main config file, all the addresses are in ipv4 format so i either have 0.0.0.0:<port> or <my_public_ip>:<port>
if i do ifconfig i can see both ipv4 and ipv6 addresses, my next step was be to completely disable ipv6 on the machine(i don't think this is the right solution, but i'm a bit desperate, also i cut myself off the instance on my first attempt😅)
Any help would be highly appreciated

Hi @lex529, Merry Christmas to you as well and welcome to the community and to OpenZiti (and zrok/BrowZer)!

Yes that looks like an IPv6 issue. The listener is exclusively listening on TCP6, ::.

This is very strange. You're saying the machine HAS an IPv4 address? You can see a locally defined IPv4 address?

Can you share the two sections:

specifically the listener section:

ctrl:
  options:
    advertiseAddress: tls:ec2-3-18-113-172.us-east-2.compute.amazonaws.com:8440
  listener:             tls:0.0.0.0:8440

and the 'interface' section of the web

web:
  - name: new-address
    bindPoints:
      - interface: 0.0.0.0:8441

are EITHER of those the advertised address or are both of the 0.0.0.0? Does the controller log any errors (in $HOME/.ziti/quickstart/$(hostname)/$(hostname).log)

I've never seen the ipv4 bind fail like this. :confused:

This is very strange. You're saying the machine HAS an IPv4 address? You can see a locally defined IPv4 address?

Yes, i'm sshing into the machine via IPv4 address, i can see both IPv4 and IPv6 when i do ifconfig

$ ifconfig
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet 10.0.201.163  netmask 255.255.248.0  broadcast 10.0.207.255
        inet6 fe80::200:17ff:fe02:d0f0  prefixlen 64  scopeid 0x20<link>
        ether 00:00:17:02:d0:f0  txqueuelen 1000  (Ethernet)
        RX packets 131168  bytes 396051969 (396.0 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 100176  bytes 41194310 (41.1 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 2758  bytes 527405 (527.4 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2758  bytes 527405 (527.4 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

the subnet on oci isn't configured for IPv6

Can you share the two sections:

# the endpoint that routers will connect to the controller over.
ctrl:
  options:
    advertiseAddress: tls:<my_public_ip>:8440
  # (optional) settings
  # set the maximum number of connect requests that are buffered and waiting to be acknowledged (1 to 5000, default 1)
  #maxQueuedConnects:      1
  # the maximum number of connects that have  begun hello synchronization (1 to 1000, default 16)
  #maxOutstandingConnects: 16
  # the number of milliseconds to wait before a hello synchronization fails and closes the connection (30ms to 60000ms, default: 5000ms)
  #connectTimeoutMs:       5000
  listener:             tls:0.0.0.0:8440
web:
  # name - required
  # Provides a name for this listener, used for logging output. Not required to be unique, but is highly suggested.
  - name: client-management
    # bindPoints - required
    # One or more bind points are required. A bind point specifies an interface (interface:port string) that defines
    # where on the host machine the webListener will listen and the address (host:port) that should be used to
    # publicly address the webListener(i.e. mydomain.com, localhost, 127.0.0.1). This public address may be used for
    # incoming address resolution as well as used in responses in the API.
    bindPoints:
      #interface - required
      # A host:port string on which network interface to listen on. 0.0.0.0 will listen on all interfaces
      - interface: 0.0.0.0:8441
        # address - required
        # The public address that external incoming requests will be able to resolve. Used in request processing and
        # response content that requires full host:port/path addresses.
        address: <my_public_ip>:8441

Does the controller log any errors

no errors, there are 2 warnings at the beginning

[   0.074] WARNING ziti/controller/config.LoadConfig: this environment is using a default generated trust domain [spiffe://<some_guid>], it is recommended that a trust domain is specified in configuration via URI SANs or the 'trustDomain' field
[   0.075] WARNING ziti/controller/config.LoadConfig: this environment is using a default generated trust domain [spiffe://<some_guid>], it is recommended that if network components have enrolled that the generated trust domain be added to the configuration field 'additionalTrustDomains' array when configuring a explicit trust domain

but at the end it says that the server is listening on all ports

INFO xweb/v2.(*Server).Start: starting ApiConfig to listen and serve tls on 0.0.0.0:8441 for server client-management with APIs: [edge-management edge-client fabric]
[   2.376]    INFO ziti/controller/network.(*Network).Run: started

maybe i'm a bit to paranoid that i'm removing the public ip, from the config/logs, if that is needed i can provide it.
I've also double-checked that it's the right ip.I copied the ip form the express install script

waiting for the controller to come online to allow the edge router to enroll
waiting for https://<my_public_ip>:8441

and did a search over the $(hostname).log and $(hostname).yaml files to be 100% sure it's the same, also used the same thing from the clipboard to ssh into a new terminal.

What happens if you replace the public IP in your two config sections with 0.0.0.0, do you get an IPv4 listener after that? That's something different between your config and mine. I wonder if somehow that public IP is the issue?

Could you try that, maybe?

i've tried changed the ip in the 2 places but i got an error when trying to run the router with ziti controller run $(hostname).yaml, but got a panic

 INFO channel/v3.(*UnderlayDispatcher).Run: started
panic: error validating ApiConfig binding edge-client: could not find [edge.api.address] value [<my_public_ip>:8441] as a bind point any instance of ApiConfig [edge-client]

afterwards went in into the .yaml file and changed the following section

edge:
  # This section represents the configuration of the Edge API that is served over HTTPS
  api:
    #(optional, default 90s) Alters how frequently heartbeat and last activity values are persisted
    # activityUpdateInterval: 90s
    #(optional, default 250) The number of API Sessions updated for last activity per transaction
    # activityUpdateBatchSize: 250
    # sessionTimeout - optional, default 30m
    # The number of minutes before an Edge API session will time out. Timeouts are reset by
    # API requests and connections that are maintained to Edge Routers
    sessionTimeout: 30m
    # address - required
    # The default address (host:port) to use for enrollment for the Client API. This value must match one of the addresses
    # defined in this Controller.WebListener.'s bindPoints.
    # address: <my_public_ip>:8441
    address: 0.0.0.0:8441

now, if i do cat "$(hostname).yaml" | grep "$(curl -s eth0.me)" the only places where the ip is uncommented is part of cert/server_cert/key values

changing the edge.api.address got the controller running again with the following output

[   1.918]    INFO channel/v3.(*UnderlayDispatcher).Run: started
[   2.485]    INFO xweb/v2.(*Server).Start: starting ApiConfig to listen and serve tls on 0.0.0.0:8441 for server client-management with APIs: [edge-management edge-client fabric]
[   2.486]    INFO ziti/controller/network.(*Network).Run: started

but when i'm running netstat -ano | grep 844 | grep LIST
i still get the bind on tcp6

tcp6       0      0 :::8441                 :::*                    LISTEN      off (0.00/0/0)
tcp6       0      0 :::8440                 :::*                    LISTEN      off (0.00/0/0)

i've checked the full netstat output, one thing that looked at least a bit strange(at least from the perspective of a noob), the ssh is also listening on tcp6 but "advertised" over IPv4

tcp6       0    724 10.0.201.163:22         <my_home_pc_ip>:61628     ESTABLISHED on (0.18/0/0)
tcp6       0      0 10.0.201.163:22         <my_home_pc_ip>:61656     ESTABLISHED keepalive (4365.43/0/0)

after a small break, and getting bored/pissed off😅, based on the above observation, i went in and change all the references to the IPs, so i change everything from 0.0.0.0 -> 10.0.201.163(my private ip), started the controller again

[   1.959]    INFO channel/v3.(*UnderlayDispatcher).Run: started
[   2.467]    INFO xweb/v2.(*Server).Start: starting ApiConfig to listen and serve tls on 10.0.201.163:8441 for server client-management with APIs: [edge-management edge-client fabric]
[   2.469]    INFO ziti/controller/network.(*Network).Run: started

i run netstat -ano | grep 844 | grep LIST

tcp        0      0 10.0.201.163:8440       0.0.0.0:*               LISTEN      off (0.00/0/0)
tcp        0      0 10.0.201.163:8441       0.0.0.0:*               LISTEN      off (0.00/0/0)

i think this is progress, had a small moment of joy😆, but still nothing works(i tried to make a request to https://<public_ip>:8441, i tried adding the admin console) still nothing, i'm running Ubuntu 24.04.1 LTS
Can it be that the it has to do with the default behavior of the OS, when it comes to how it listens to things?

The only place in your config file an IP might be referenced is in the advertised section. The other sections should all remain as 0.0.0.0 until we work through what's happening.

I reread the post today, you mentioned OCI and Ubuntu 24.04. could firewalld somehow be in this mix and blocking ports, or perhaps selinux is somehow enabled and interesting? Are you using an arm based distro or x64?