ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:1280]: {remote=[172.20.0.4:38382] error=[tls: client didn't provide a certificate]} handshake failed

Hey I'm trying to follow long with the docker self hosting guide (Self-hosting guide for Docker | Zrok) and I'm running into the error in the title, in the ziti-quickstart logs (it's the first of several, but I'm assuming it's the source of my problems here and in the controller).

Not sure where to start for info, but my .env is

CADDY_DNS_PLUGIN=cloudflare
CADDY_DNS_PLUGIN_TOKEN=[Mytoken]
# https://acme-v02.api.letsencrypt.org/directory
# currently using staging api until setup
CADDY_ACME_API=https://acme-staging-v02.api.letsencrypt.org/directory
ZROK_DNS_ZONE=zrok.mydomain.dev

ZROK_USER_EMAIL=myemail@outlook.com
ZROK_USER_PWD=[userpwd]
ZITI_PWD=[zitipwd]
ZROK_ADMIN_TOKEN=[admintoken]

ZITI_CTRL_ADVERTISED_PORT=1280
ZITI_ROUTER_PORT=3022

My DNS records are on cloudflare, with the following entries

A       *.zrok     publicip    DNS only
A        zrok      publicip    DNS only

The API token that I used has the following settings:
Zone DNS Settings Edit
Zone Zone Settings Edit
Zone Zone Edit
Zone DNS Edit

Client IP operator is in (my server's public ip)

@qrkourier ping as requested; thanks for looking

Hi Ben! Welcome to the forum.

Here's an issue explaining why the client handshake error is not an error: spurious handshake failed errors are actually expected, not errors · Issue #2486 · openziti/ziti · GitHub

Since that's not a problem, what's a symptom of the problem you're experiencing? I assume something didn't work.

By the way, I ran through the Docker self-hosting guide on an arm64 VPS without issue, so we will find a way to get it working for you!

Ultimately, I can't enable the zrok service from a client machine, nor can I get into the admin panel. The error from the controller logs is error connecting to the ziti edge management api: Get "https://ziti.zrok.[mydomain].dev:1280/edge/management/v1/.well-known/est/cacerts": dial tcp 172.20.0.4:1280: connect: connection refused

I will also note that I can nmap 80, 443, 1280, and 3022 on the vm from a client machine, all with ports open

Similarly the zrok-frontend logs show a similar error The connection to the server ziti.zrok.[mydomain].dev:1280 was refused - did you specify the right host or port?

Sry apparently I replied to myself instead of you lol

Not a problem.

I see why you went looking for clues in the ziti-quickstart container's log. Is it even running?

Show container statuses, published and exposed ports, etc.

docker compose ps

Reveal how vars were interpreted by compose

docker compose config
zrok-caddy-1             zrok-caddy                           "caddy run --config …"   caddy             2 hours ago   Up 2 hours             0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp, 443/udp, 2019/tcp
zrok-ziti-quickstart-1   docker.io/openziti/ziti-cli:latest   "bash -euc 'ZITI_CMD…"   ziti-quickstart   2 hours ago   Up 2 hours (healthy)   0.0.0.0:1280->1280/tcp, 0.0.0.0:3022->3022/tcp
zrok-zrok-controller-1   zrok-zrok-controller                 "bootstrap-controlle…"   zrok-controller   2 hours ago   Up 2 hours             127.0.0.1:18080->18080/tcp
zrok-zrok-frontend-1     zrok-zrok-frontend                   "bootstrap-frontend.…"   zrok-frontend     2 hours ago   Up 2 hours             127.0.0.1:8080-8081->8080-8081/tcp
name: zrok
services:
  caddy:
    build:
      context: /home/ubuntu/zrok
      dockerfile: ./caddy.Dockerfile
      args:
        CADDY_DNS_PLUGIN: cloudflare
    environment:
      CADDY_ACME_API: https://acme-staging-v02.api.letsencrypt.org/directory
      CADDY_DNS_PLUGIN: cloudflare
      CADDY_DNS_PLUGIN_TOKEN: [cloudflaretoken]
      ZROK_CTRL_PORT: "18080"
      ZROK_DNS_ZONE: zrok.[mydomain].dev
      ZROK_FRONTEND_PORT: "8080"
      ZROK_OAUTH_PORT: "8081"
      ZROK_USER_EMAIL: [myemail]@outlook.com
    expose:
      - 80/tcp
      - 443/tcp
      - 443/udp
      - 2019/tcp
    networks:
      zrok-instance: null
    ports:
      - mode: ingress
        host_ip: 0.0.0.0
        target: 80
        published: "80"
        protocol: tcp
      - mode: ingress
        host_ip: 0.0.0.0
        target: 443
        published: "443"
        protocol: tcp
    restart: unless-stopped
    volumes:
      - type: volume
        source: caddy_data
        target: /data
        volume: {}
      - type: volume
        source: caddy_config
        target: /config
        volume: {}
  ziti-quickstart:
    command:
      - --
      - edge
      - quickstart
      - --home
      - /home/ziggy/quickstart
    depends_on:
      ziti-quickstart-init:
        condition: service_completed_successfully
        required: true
    entrypoint:
      - bash
      - -euc
      - |
        ZITI_CMD+=" --ctrl-address ziti.zrok.[mydomain].dev"\
        " --ctrl-port 1280"\
        " --router-address ziti.zrok.[mydomain].dev"\
        " --router-port 3022"\
        " --password [pw]"
        echo "DEBUG: run command is: ziti $${@} $${ZITI_CMD}"
        exec ziti "$${@}" $${ZITI_CMD}
    environment:
      HOME: /home/ziggy
      PFXLOG_NO_JSON: "true"
      ZITI_ROUTER_NAME: quickstart-router
    expose:
      - "1280"
      - "3022"
    healthcheck:
      test:
        - CMD
        - ziti
        - agent
        - stats
      timeout: 3s
      interval: 3s
      retries: 5
      start_period: 30s
    image: docker.io/openziti/ziti-cli:latest
    networks:
      zrok-instance:
        aliases:
          - ziti.zrok.[mydomain].dev
    ports:
      - mode: ingress
        host_ip: 0.0.0.0
        target: 1280
        published: "1280"
        protocol: tcp
      - mode: ingress
        host_ip: 0.0.0.0
        target: 3022
        published: "3022"
        protocol: tcp
    restart: unless-stopped
    user: "1000"
    volumes:
      - type: volume
        source: ziti_home
        target: /home/ziggy
        volume: {}
  ziti-quickstart-check:
    command:
      - echo
      - Ziti is cooking
    depends_on:
      ziti-quickstart:
        condition: service_healthy
        required: true
    image: busybox
    networks:
      default: null
  ziti-quickstart-init:
    command:
      - chown
      - -Rc
      - "1000"
      - /home/ziggy
    environment:
      HOME: /home/ziggy
    image: busybox
    networks:
      default: null
    user: root
    volumes:
      - type: volume
        source: ziti_home
        target: /home/ziggy
        volume: {}
  zrok-controller:
    build:
      context: /home/ubuntu/zrok
      dockerfile: ./zrok-controller.Dockerfile
      args:
        ZITI_CTRL_ADVERTISED_PORT: "1280"
        ZITI_PWD: [pw]
        ZROK_ADMIN_TOKEN: [token]
        ZROK_CLI_IMAGE: openziti/zrok
        ZROK_CLI_TAG: latest
        ZROK_CTRL_PORT: "18080"
        ZROK_DNS_ZONE: zrok.[mydomain].dev
    command:
      - zrok
      - controller
      - /etc/zrok-controller/config.yml
      - --verbose
    depends_on:
      zrok-permissions:
        condition: service_completed_successfully
        required: true
    environment:
      ZROK_ADMIN_TOKEN: [token]
      ZROK_API_ENDPOINT: http://zrok-controller:18080
      ZROK_USER_EMAIL: [email]@outlook.com
      ZROK_USER_PWD: [pwd]
    expose:
      - "18080"
    networks:
      zrok-instance:
        aliases:
          - zrok.zrok.[mydomain].dev
    ports:
      - mode: ingress
        host_ip: 127.0.0.1
        target: 18080
        published: "18080"
        protocol: tcp
    restart: unless-stopped
    user: "2171"
    volumes:
      - type: volume
        source: zrok_ctrl
        target: /var/lib/zrok-controller
        volume: {}
  zrok-frontend:
    build:
      context: /home/ubuntu/zrok
      dockerfile: zrok-frontend.Dockerfile
      args:
        ZROK_CLI_IMAGE: openziti/zrok
        ZROK_CLI_TAG: latest
        ZROK_DNS_ZONE: zrok.[mydomain].dev
        ZROK_FRONTEND_PORT: "8080"
        ZROK_OAUTH_GITHUB_CLIENT_ID: noop
        ZROK_OAUTH_GITHUB_CLIENT_SECRET: noop
        ZROK_OAUTH_GOOGLE_CLIENT_ID: noop
        ZROK_OAUTH_GOOGLE_CLIENT_SECRET: noop
        ZROK_OAUTH_HASH_KEY: noop
        ZROK_OAUTH_PORT: "8081"
    command:
      - zrok
      - access
      - public
      - /etc/zrok-frontend/config.yml
      - --verbose
    depends_on:
      zrok-permissions:
        condition: service_completed_successfully
        required: true
    environment:
      HOME: /var/lib/zrok-frontend
      ZITI_CTRL_ADVERTISED_PORT: "1280"
      ZITI_PWD: [zrokpwd]
      ZROK_ADMIN_TOKEN: [zrokadmin]
      ZROK_API_ENDPOINT: http://zrok-controller:18080
      ZROK_DNS_ZONE: zrok.[mydomain].dev
      ZROK_FRONTEND_PORT: "443"
      ZROK_FRONTEND_SCHEME: https
    expose:
      - "8080"
      - "8081"
    networks:
      zrok-instance: null
    ports:
      - mode: ingress
        host_ip: 127.0.0.1
        target: 8080
        published: "8080"
        protocol: tcp
      - mode: ingress
        host_ip: 127.0.0.1
        target: 8081
        published: "8081"
        protocol: tcp
    restart: unless-stopped
    user: "2171"
    volumes:
      - type: volume
        source: zrok_frontend
        target: /var/lib/zrok-frontend
        volume: {}
  zrok-permissions:
    command:
      - /bin/sh
      - -euxc
      - |
        chown -Rc 2171 /var/lib/zrok-*;
        chmod -Rc ug=rwX,o-rwx /var/lib/zrok-*;
    image: busybox
    networks:
      default: null
    volumes:
      - type: volume
        source: zrok_ctrl
        target: /var/lib/zrok-controller
        volume: {}
      - type: volume
        source: zrok_frontend
        target: /var/lib/zrok-frontend
        volume: {}
networks:
  default:
    name: zrok_default
  zrok-instance:
    name: zrok_zrok-instance
    driver: bridge
volumes:
  caddy_config:
    name: zrok_caddy_config
  caddy_data:
    name: zrok_caddy_data
  ziti_home:
    name: zrok_ziti_home
  zrok_ctrl:
    name: zrok_zrok_ctrl
  zrok_frontend:
    name: zrok_zrok_frontend

Oh and I'm aware the zrok enable shouldn't work with the staging lets encrypt, but would that be the cause of the other errors as well?

Also the Caddyfile validation did have one warning
http.auto_https server is listening only on the HTTP port, so no automatic HTTPS will be applied to this server {"server_name": "srv1", "http_port": 80}

Yeah, Caddy always says that when it initializes the plain HTTP listener.

True, but something else is happening way sooner if the zrok controller and frontend can't call the ziti controller on :1280.

Can you reproduce the "connection refused" (TCP closed) result with cURL?

docker compose exec zrok-controller curl -sk https://ziti.zrok.[mydomain].dev:1280/

That closed port from the zrok controller's perspective is symptomatic of a problem with the Docker bridge network's alias. You docker compose config shows what I expected: the ziti-quickstart container is on the zrok-instance network and has an alias matching its FQDN...so it should work.

FYI the way this works is containers on the same Docker bridge look up the ziti controller (the ziti-quickstart container) by its alias in the Docker resolver, and everyone else looks it up in regular DNS, finds the VPS public IP, and gets forwarded to the container by Docker published port 1280.

No it's a successful json response

Hang on, I remember I had to restart everything one time because aliases didn't work for an unknown reason.

Non-destructive restart:

docker compose up --build --detach --force-recreate

Same panic in the zrok-controller logs, I tried something similar but also deleting the volumes a bit earlier tonight as well lol

Do you get the same good JSON response from ziti-controller when you ran cURL in the zrok-controller or zrok-frontend containers?

If it's crashing and you can't exec you can run instead.

docker compose run --rm --entrypoint= zrok-controller curl -sk https://ziti.zrok.[mydomain].dev:1280/
docker compose run --rm --entrypoint= zrok-frontend curl -sk https://ziti.zrok.[mydomain].dev:1280/

Yeah both of those produce good responses

Weird. OK, recap:

The zrok-controller can't dial the ziti-controller on 1280/tcp. It's looking up the ziti-controller's advertised address in Docker bridge DNS as 172.20.0.4 and gets connection refused. However, curl running in the same container can talk to it just fine.

:person_shrugging:

Let's double check that's actually the correct IP address from the Docker resolver for the ziti-controller (ziti-quickstart) container.

ip route get 172.20.0.4

This should show you a bridge interface name that is assigned a Docker host address for that bridge, e.g. 172.20.0.1, that's used to publish ports.

For this one you'll have to substitute the correct raw container id which varies depending on compose project name, e.g., "zrok" in my case. Find the id with docker ps|grep ziti-quickstart.

docker inspect zrok-ziti-quickstart-1 | jq '.[].NetworkSettings.Networks[].IPAddress'

EDIT: realized the IP may be different now that the containers were restarted. We're trying to confirm the IP of the ziti-quickstart container matches the one that zrok-controller is trying to dial when it gets "connection refused."

IP came in as 172.20.0.5

172.20.0.5 dev br-9e2ca809fa9b src 172.20.0.1 uid 1001 cache

(which does match the log for newly restarted services)

panic: error connecting to the ziti edge management api: Get "https://ziti.zrok.[mydomain].dev:1280/edge/management/v1/.well-known/est/cacerts": dial tcp 172.20.0.5:1280: connect: connection refused

I'll try to think of something else after a sleep. I'm stumped.

I can't think of a way tcpdump would disagree, but we can check. Insert your bridge interface name.

sudo tcpdump -nnvi br-b55c5e65a6c2 '((tcp port 1280 or tcp port 1281) and (tcp[tcpflags] & (tcp-syn|tcp-rst) != 0)) or (icmp[icmptype] = 3 and icmp[icmpcode] = 3)'

Assuming there's no firewall stuff happening on this interface, you should see a SYN followed immediately by RST if the port is closed (I curl'd closed port 1281). This capture filter will also match firewall-related ICMP destination unreachable type responses, in case that's what's happening here.

05:28:58.773111 IP (tos 0x0, ttl 64, id 29697, offset 0, flags [DF], proto TCP (6), length 60)
    172.18.0.5.33428 > 172.18.0.3.1281: Flags [S], cksum 0x585b (incorrect -> 0xbf8c), seq 1967160681, win 64240, options [mss 1460,sackOK,TS val 1584028844 ecr 0,nop,wscale 7], length 0
05:28:58.773131 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    172.18.0.3.1281 > 172.18.0.5.33428: Flags [R.], cksum 0xdd63 (correct), seq 0, ack 1967160682, win 0, length 0

If still no clues then I'll have to try it on an Oracle VPS in case there's something different. Seems like a long shot.

EDIT: You'd mentioned using nmap to probe all the ports you opened in the VPS firewall. You can cURL to 1280 from the Docker host (the VPS), and you can curl to the public IP on 1280 just like we did inside the zrok-controller container? So, the only path that's not working is zrok-controller => ziti-controller (ziti-quickstart)?

sudo tcpdump -nnvi br-9e2ca809fa9b '((tcp port 1280 or tcp port 1281) and (tcp[tcpflags] & (tcp-syn|tcp-rst) != 0)) or (icmp[icmptype] = 3 and icmp[icmpcode] = 3)'
tcpdump: listening on br-9e2ca809fa9b, link-type EN10MB (Ethernet), snapshot length 262144 bytes

# parallel (Is this the one you wanted me to run btw?)
docker compose exec zrok-controller curl -sk https://ziti.zrok.[mydomain].dev:1280/

# produces
05:58:58.945297 IP (tos 0x0, ttl 64, id 9117, offset 0, flags [DF], proto TCP (6), length 60)
    172.20.0.3.60480 > 172.20.0.5.1280: Flags [S], cksum 0x585f (incorrect -> 0x51f8), seq 1018623748, win 64240, options [mss 1460,sackOK,TS val 1655184194 ecr 0,nop,wscale 7], length 0
05:58:58.945329 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    172.20.0.5.1280 > 172.20.0.3.60480: Flags [S.], cksum 0x585f (incorrect -> 0x5ca4), seq 4013216289, ack 1018623749, win 65160, options [mss 1460,sackOK,TS val 457446672 ecr 1655184194,nop,wscale 7], length 0
05:58:59.005807 IP (tos 0x0, ttl 64, id 9125, offset 0, flags [DF], proto TCP (6), length 52)
    172.20.0.3.60480 > 172.20.0.5.1280: Flags [R.], cksum 0x5857 (incorrect -> 0x6c61), seq 748, ack 6205, win 501, options [nop,nop,TS val 1655184254 ecr 457446732], length 0

So, the only path that's not working is zrok-controller => ziti-controller (ziti-quickstart)

zrok-frontend => ziti-controller is also showing the errors

The connection to the server ziti.zrok.[mydomain].dev:1280 was refused - did you specify the right host or port?

# bit further down
curl: (7) Failed to connect to zrok-controller port 18080: Connection refused