I think quickstart:0.30.1 broke something in the edge-tunnel connections

I think quickstart:0.30.1 broke something in the edge-tunnel connections.

I backed up my docker compose volumes and rebuilt with the 0.30.1 as suggested, but I could not get even the most basic service to work. I hammered at it for HOURS... no joy. For a lark, I redirected my docker volumes and started my old one up to see if it worked. I had to re-enroll some identities, but I got it back and it was NOT working. So I tried specifying the quickstart:0.30.0 image and WHAMMY... it worked. So I changed it back again to the 0.30.1 and once again, no joy.

Here is the tunneler output on the 0.30.1 on a linux host that is trying to ssh into an identity.

Aug 25 20:54:27 pve1 systemd[1]: Started ziti-edge-tunnel.service - Ziti Edge Tunnel.
... ignoring my DNS errors unless you want to see it.
Aug 25 20:55:08 pve1 ziti-edge-tunnel[213980]: (213980)[       40.745]   ERROR ziti-sdk:channel.c:860 on_channel_connect_internal() ch[0] failed to connect to ER[ziti.mydomain.com] [-3008/unknown node or service]
Aug 25 20:55:22 pve1 ziti-edge-tunnel[213980]: (213980)[       55.298]   ERROR ziti-sdk:channel.c:860 on_channel_connect_internal() ch[0] failed to connect to ER[ziti.mydomain.com] [-3001/temporary failure]
Aug 25 20:55:50 pve1 ziti-edge-tunnel[213980]: (213980)[       83.101]   ERROR ziti-sdk:channel.c:860 on_channel_connect_internal() ch[0] failed to connect to ER[ziti.mydomain.com] [-3008/unknown node or service]
Aug 25 20:56:25 pve1 ziti-edge-tunnel[213980]: (213980)[      118.466]   ERROR ziti-sdk:channel.c:860 on_channel_connect_internal() ch[0] failed to connect to ER[ziti.mydomain.com] [-3001/temporary failure]
Aug 25 20:57:53 pve1 ziti-edge-tunnel[213980]: (213980)[      205.899]   ERROR ziti-sdk:channel.c:860 on_channel_connect_internal() ch[0] failed to connect to ER[ziti.mydomain.com] [-3001/temporary failure]
Aug 25 20:59:37 pve1 ziti-edge-tunnel[213980]: (213980)[      310.089]   ERROR ziti-sdk:channel.c:860 on_channel_connect_internal() ch[0] failed to connect to ER[ziti.mydomain.com] [-3001/temporary failure]
Aug 25 20:59:37 pve1 ziti-edge-tunnel[213980]: (213980)[      310.089]   ERROR ziti-sdk:connect.c:281 on_channel_connected() ztx[0] ch[0] failed to connect [-3001/temporary failure]
Aug 25 20:59:47 pve1 ziti-edge-tunnel[213980]: (213980)[      320.086]    WARN ziti-sdk:connect.c:332 connect_timeout() conn[0.2/Connecting] connect timeout: no suitable edge router
Aug 25 20:59:47 pve1 ziti-edge-tunnel[213980]: (213980)[      320.086]   ERROR tunnel-cbs:ziti_tunnel_cbs.c:103 on_ziti_connect() ziti dial failed: operation did not complete in time

Output with the tunneler back on 0.30.0 there is nothing of interest after it starts... services just work.

Aug 25 21:02:40 pve1 systemd[1]: Started ziti-edge-tunnel.service - Ziti Edge Tunnel.
... ignoring my DNS errors unless you want to see it.
nothing from here out... just works

Here is what I get if I try to connect to a device I haven't reenroled. The identity exists but I am guessing this error is that it can't find it to connect.

Aug 25 21:07:25 pve1 ziti-edge-tunnel[227246]: (227246)[      284.076]   ERROR ziti-sdk:connect.c:919 connect_reply_cb() conn[0.4/Connecting] failed to connect, reason=service 5PTrC1jpZkVtrrP8wat5fH has no terminators for instanceId pve2.jp
Aug 25 21:07:25 pve1 ziti-edge-tunnel[227246]: (227246)[      284.076]   ERROR tunnel-cbs:ziti_tunnel_cbs.c:103 on_ziti_connect() ziti dial failed: connection is closed

Which version of the ziti-edge-tunnel are you using?

ziti-edge-tunnel version

Let's make sure the problem isn't fixed by upgrading to tunneler >= 0.22.7 (link to latest release if you're not using DEB or RPM).

I might have a similar issue. See Reduce number of Service Policies for Monitoring - #34 by Metz
I have Version v0.22.7-local but also zssh has an issue.

This is strange. I previously tested quickstart 0.30.0 startup successfully after it dropped, but now I'm getting a ctrl plane certificate error for both quickstart 0.30.0 and 0.30.1. The problem seems to be the controller's ctrl plane TLS server is not presenting any certificates at all, typically 6262/tcp.

@TheLumberjack @berrabe FYI something's not right with the two latest quickstart releases.

I used the cert chains checker script to diagnose the problem with the ctrl plane listener's cert chain.

curl -sSf https://raw.githubusercontent.com/openziti/ziti/add-signer-grandparent-ziti-env/quickstart/check-repair-cert-chains.bash | bash

I confirmed 0.30.0 and 0.30.1 are not identical container images.

[
  "latest",
  "sha256:2f14bb593d33f924422fcf25e592e7fffad579681a34e8ef761ba87c2cd2a46f"
]
[
  "0.30.1",
  "sha256:2f14bb593d33f924422fcf25e592e7fffad579681a34e8ef761ba87c2cd2a46f"
]
[
  "0.30.0",
  "sha256:2958147ec2b67c445a74cf4d067d4dbbea73b3d8282941eb7caff38c36287476"
]
1 Like

Summary:

  • quickstart >0.29.0 has a problem where the controller's ctrl plane (e.g., 6262/tcp) fails to present any server certificate.
  • I'm able to "repair" 0.29.0 by restarting the controller container after running:
curl -s https://raw.githubusercontent.com/openziti/ziti/add-signer-grandparent-ziti-env/quickstart/test/check-cert-chains.bash \
| bash -s -- --rebuild
INFO: backing up /persistent/pki/cas.pem to /persistent/pki/cas.pem.20230826195706.bak
INFO: backing up /persistent/ziti-controller.yaml to /persistent/ziti-controller.yaml.20230826195706.bak
The following changes were made: 
        * Rebuilt the controller CA bundle 
        * Replaced the controller edge intermediate CA cert in the client API's web listener identity.ca with the controller edge root CA cert 

Please restart the controller and all the main router and re-run this script without --rebuild.

(restart controller container)

curl -s https://raw.githubusercontent.com/openziti/ziti/add-signer-grandparent-ziti-env/quickstart/test/check-cert-chains.bash \
| bash -s 
INFO: router certificate chain verified.
INFO: ctrl_plane certificate chain verified.
INFO: edge_web certificate chain verified.

Verify Ziti control and data planes are functioning:

$ go test ./quickstart/test/quickstart_test.go 
ok      command-line-arguments  10.621s

This does not repair quickstart >0.29.0 because, in newer versions, the controller's ctrl plane TLS server doesn't present any certs. I haven't been able to diagnose that one yet.

Confirmed version below.

# ziti-edge-tunnel version
v0.22.7-local

Is there any more info on this issue?

Hi @jptechnical, yes there was a change in 0.30.1 that changed environment variables. A new docker compose file was pushed too. It comes down to a problem with ZITI_ROUTER_ADVERTISED_ADDRESS and ZITI_ROUTER_ADVERTISED_HOST. If you add BOTH fields to the environment section of the docker-compose file, and BOTH fields to the .env file you can flop back and forth like you're trying to do:

      - ZITI_ROUTER_ADVERTISED_ADDRESS=${ZITI_ROUTER_ADVERTISED_ADDRESS:-ziti-edge-router}
      - ZITI_ROUTER_ADVERTISED_HOST=${ZITI_ROUTER_ADVERTISED_HOST:-ziti-edge-router}

Alternatively, you can just use one or the other. 0.30.0 uses ZITI_ROUTER_ADVERTISED_HOST, while 0.30.1 uses ZITI_ROUTER_ADVERTISED_ADDRESS.

Or you can just move to 0.30.1, get the latest compose file, and update your .env file (that's what I'd recommend).

Here's what I used locally in case that helps:

docker-compose.yml
version: '2.4'
services:
  ziti-controller:
    image: "${ZITI_IMAGE}:${ZITI_VERSION}"
    env_file:
      - ./.env
    ports:
      - ${ZITI_CTRL_EDGE_ADVERTISED_PORT:-1280}:${ZITI_CTRL_EDGE_ADVERTISED_PORT:-1280}
      - ${ZITI_CTRL_ADVERTISED_PORT:-6262}:${ZITI_CTRL_ADVERTISED_PORT:-6262}
    environment:
      - ZITI_CTRL_NAME=${ZITI_CTRL_NAME:-ziti-edge-controller}
      - ZITI_CTRL_EDGE_ADVERTISED_ADDRESS=${ZITI_CTRL_EDGE_ADVERTISED_ADDRESS:-ziti-edge-controller}
      - ZITI_CTRL_EDGE_ADVERTISED_PORT=${ZITI_CTRL_EDGE_ADVERTISED_PORT:-1280}
      - ZITI_CTRL_EDGE_IP_OVERRIDE=${ZITI_CTRL_EDGE_IP_OVERRIDE:-127.0.0.1}
      - ZITI_CTRL_ADVERTISED_PORT=${ZITI_CTRL_ADVERTISED_PORT:-6262}
      - ZITI_EDGE_IDENTITY_ENROLLMENT_DURATION=${ZITI_EDGE_IDENTITY_ENROLLMENT_DURATION}
      - ZITI_ROUTER_ENROLLMENT_DURATION=${ZITI_ROUTER_ENROLLMENT_DURATION}
      - ZITI_USER=${ZITI_USER:-admin}
      - ZITI_PWD=${ZITI_PWD}
    networks:
      ziti:
        aliases:
          - ziti-edge-controller
    volumes:
      - ziti-fs:/persistent
    entrypoint:
      - "/var/openziti/scripts/run-controller.sh"

  ziti-controller-init-container:
    image: "${ZITI_IMAGE}:${ZITI_VERSION}"
    depends_on:
      - ziti-controller
    environment:
      - ZITI_CTRL_EDGE_ADVERTISED_ADDRESS=${ZITI_CTRL_EDGE_ADVERTISED_ADDRESS:-ziti-edge-controller}
      - ZITI_CTRL_EDGE_ADVERTISED_PORT=${ZITI_CTRL_EDGE_ADVERTISED_PORT:-1280}
    env_file:
      - ./.env
    networks:
      ziti:
    volumes:
      - ziti-fs:/persistent
    entrypoint:
      - "/var/openziti/scripts/run-with-ziti-cli.sh"
    command:
      - "/var/openziti/scripts/access-control.sh"

  ziti-edge-router:
    image: "${ZITI_IMAGE}:${ZITI_VERSION}"
    env_file:
      - ./.env
    depends_on:
      - ziti-controller
    ports:
      - ${ZITI_ROUTER_PORT:-3022}:${ZITI_ROUTER_PORT:-3022}
      - ${ZITI_ROUTER_LISTENER_BIND_PORT:-10080}:${ZITI_ROUTER_LISTENER_BIND_PORT:-10080}
    environment:
      - ZITI_CTRL_ADVERTISED_ADDRESS=${ZITI_CTRL_ADVERTISED_ADDRESS:-ziti-controller}
      - ZITI_CTRL_ADVERTISED_PORT=${ZITI_CTRL_ADVERTISED_PORT:-6262}
      - ZITI_CTRL_EDGE_ADVERTISED_ADDRESS=${ZITI_CTRL_EDGE_ADVERTISED_ADDRESS:-ziti-edge-controller}
      - ZITI_CTRL_EDGE_ADVERTISED_PORT=${ZITI_CTRL_EDGE_ADVERTISED_PORT:-1280}
      - ZITI_ROUTER_NAME=${ZITI_ROUTER_NAME:-ziti-edge-router}
      - ZITI_ROUTER_ADVERTISED_ADDRESS=${ZITI_ROUTER_ADVERTISED_ADDRESS:-ziti-edge-router}
      - ZITI_ROUTER_ADVERTISED_HOST=${ZITI_ROUTER_ADVERTISED_HOST:-ziti-edge-router}
      - ZITI_ROUTER_PORT=${ZITI_ROUTER_PORT:-3022}
      - ZITI_ROUTER_LISTENER_BIND_PORT=${ZITI_ROUTER_LISTENER_BIND_PORT:-10080}
      - ZITI_ROUTER_ROLES=public
    networks:
      - ziti
    volumes:
      - ziti-fs:/persistent
    entrypoint: /bin/bash
    command: "/var/openziti/scripts/run-router.sh edge"

  ziti-console:
    image: openziti/zac
    working_dir: /usr/src/app
    environment:
      - ZAC_SERVER_CERT_CHAIN=/persistent/pki/${ZITI_CTRL_EDGE_ADVERTISED_ADDRESS:-ziti-edge-controller}-intermediate/certs/${ZITI_CTRL_EDGE_ADVERTISED_ADDRESS:-ziti-edge-controller}-server.cert
      - ZAC_SERVER_KEY=/persistent/pki/${ZITI_CTRL_EDGE_ADVERTISED_ADDRESS:-ziti-edge-controller}-intermediate/keys/${ZITI_CTRL_EDGE_ADVERTISED_ADDRESS:-ziti-edge-controller}-server.key
      - ZITI_CTRL_EDGE_ADVERTISED_ADDRESS=${ZITI_CTRL_EDGE_ADVERTISED_ADDRESS:-ziti-edge-controller}
      - ZITI_CTRL_EDGE_ADVERTISED_PORT=${ZITI_CTRL_EDGE_ADVERTISED_PORT:-1280}
      - ZITI_CTRL_NAME=${ZITI_CTRL_NAME:-ziti-edge-controller}
      - PORTTLS=8443
    depends_on:
      - ziti-controller
    ports:
      - 8443:8443
    volumes:
      - ziti-fs:/persistent
    networks:
      - ziti

networks:
  ziti:

volumes:
  ziti-fs:
my .env file
# OpenZiti Variables
ZITI_IMAGE=openziti/quickstart
ZITI_VERSION=0.30.1

# the user and password to use
# Leave password blank to have a unique value generated or set the password explicitly
ZITI_USER=admin
ZITI_PWD=admin

# controller name, address/port information
ZITI_CTRL_NAME=ziti-controller
ZITI_CTRL_EDGE_ADVERTISED_ADDRESS=ctrl.home.pi
ZITI_CTRL_ADVERTISED_ADDRESS=ctrl.home.pi
ZITI_CTRL_ADVERTISED_HOST=ctrl.home.pi
#ZITI_CTRL_EDGE_IP_OVERRIDE=10.10.10.10
ZITI_CTRL_EDGE_ADVERTISED_PORT=8441
ZITI_CTRL_ADVERTISED_PORT=8440

# The duration of the enrollment period (in minutes), default if not set. shown - 7days
ZITI_EDGE_IDENTITY_ENROLLMENT_DURATION=10080
ZITI_ROUTER_ENROLLMENT_DURATION=10080

# router address/port information
#ZITI_ROUTER_NAME=ziti-edge-router
ZITI_ROUTER_ADVERTISED_ADDRESS=er.home.pi
ZITI_ROUTER_ADVERTISED_PORT=er.home.pi
ZITI_ROUTER_PORT=8442
#ZITI_ROUTER_IP_OVERRIDE=10.10.10.10
ZITI_ROUTER_LISTENER_BIND_PORT=8444
#ZITI_ROUTER_ROLES=public

With that compose file, I was able to successfully use zssh to zssh back to my own machine... So I'm pretty sure that's what happened here.

I don't expect any more changes like this for a long time... I think we're through the bumpy change period.

Cheers!

Excellent, that's who I was hoping.

One last question then related to this, once I rebuild my resources using the new clean install from the quick start, is there any reason I can't use this instance in a small production workload?

To clarify, I'm looking to use this on my internal network and a couple of small point-to-point pairings out in the wild to be able to get some real time experience with it. But what I don't want to do is run into another spot where I need to rebuild it or do a major overhaul to the certificates or anything like that.

I'll be honest, this little weekend of issues force me to evaluate some other options even though I really didn't want to. I know there is stability in the commercial product, but if I'm really looking for a self-hosted solution and the commercial product for cases where I need that extra support that I can't provide on my own.

Certainly no reason. While the original goal of the quickstart was, and remains, an educational vehicle, the defaults are all sensible and inline with best practices. The PKI is arguably overly complex, but in practice plenty of people run the quickstart in their production setups. Also note that the quickstart's main benefit is in generating the PKI and initial configuration and these are things that in practice, change little. I don't see any problems at all.

Thanks for your candid feedback. I'll be honest, I don't blame you. Hopefully, through our responsiveness, we've earned your trust that even when a rocky patch is hit, the team is responsive and fixes problems quickly. I do honestly believe that there have been very few "rough patches" like this with our quickstarts in the past. Not to say there have been none, it does still happen, but generally, they are quickly resolved. We're also undergoing an effort to add even more testing around these sorts of things so that we learn from our past and don't repeat mistakes.

I even commented on this post as to how to backup and restore an instance. You'll want to read on over here for those few steps: Backup / restore in hosted server - #4 by TheLumberjack

Cheers

Thanks @TheLumberjack I have been super-wowed by the responsiveness. The timing was just not great... I was ready to start dogfooding it and had an immediate need for a really simple p2p rdp connection and had it all setup and half-deployed.

In the end, the extra effort was worth it... because I did get to look at some side-by-side comparison with a couple of wireguard setups... and while they were vastly more simple... I started bumping into the walls way faster than I thought. Ziti is just a greatly more capable solution.

Anyway, thanks for the transparency and kindness with some ranting. You continue to deliver.

Thank you for the update. I will test this later today.

I like your great support. Thank you for that.

Also I had a great learning curve the last days. Helped me getting more knowlegde.

Maybe two solutions for more transparency:

  • Having a channel where critcal changes are announced (Quick Start Version 0.30.1 needs tunnel Version 0.22.6, env names changed, ...). This will help updating the system whitout crashing.
  • use a differnt container for pki, use a init container for pki, router and controller stuff. Best via ansible, so that I would be able to do the init from outside the environment.
  • have a page which shows compatible vertions. (tunnel 0.226 will work with ctrl 0.30.1, ...)
1 Like

Thanks for the feedback. We appreciate when people take the time to provide it!

Thanks. I tried to do that via Discourse itself with a "psa" but sounds like that missed the mark. I'd be interested in hearing what sort of ideas you have for this sort of communication.

That's something to consider. thx

I don't think we state it anywhere but we do strive for full backward compatibility of older clients with newer networks. Older SDKs/tunnelers shouldn't need to be touched when upgrading a network. I'll bring this comment to the team and see if we can codify that stance somewhere.

The PSA definitely got our attention. So that was a win. So there is a good opportunity to iterate on that.

I think where it fell short was that it gave a scenario, and a script, and possible fix. However, I believe we would do better with details on what is the issue, what you are looking for, and how to fix it. If you started with a TLDR; and a birds-eye view of the problem and solution (least effort), then that will give enough info to decide if this affects me or not and how much I want to dig in.

As regards the script, you definitely have a specific style of scripting that I haven't seen much. It's not a criticism, I love seeing other people's code, I learned a lot already by looking up some of your switches to common commands. So, no criticism intended, but it wouldn't hurt to be more verbose, make commands more long-form or briefly introduce the intent.

I know all of this means more work... but in the end I think it will save you more time.

1 Like

@TheLumberjack
Sorry that I missed that. But I did not know the meaning of PSA ... .

A Public Service Announcement is ok. Now that I know.

Could you create a dedicated categorie for those PSAs and allow only OpenZiti Maintainer write rights? So I can track this categorie.

Actually I'm getting a notifaction for all messages. That help for learning. But I might disable some of them in future.

Neat idea! I'll look into that to see if discourse supports it! Thanks

You could just make it a pinned topic, then we can subscribe to it when we want to get notices. Otherwise it stays at the top.

However, the cynic in me says that nobody reads the pinned posts... so the situation may not actually be any different then it is now. :person_shrugging:

1 Like

I looked into it. The best I could do is make a "category" named something like "Important News" or "Public Notice" or ... something along those lines. Clearly, "PSA" is too "english-centric" (perhaps too US-centric even), that's my bad entirely. Something less-easily confused is needed.

I'll end up making a topic if needed. I have pinned topics in the past (thought I pinned this one?) but I like the idea of a category that people can opt into for important news/notifications/messages.

And add a link to the dicrource category to the Quickstart or Troubleshooting documentation Start Cooking With Ziti | OpenZiti
I believe thats the starting point for a lot of people and so the know where to subscribe.