Reduce number of Service Policies for Monitoring

This is always an edge router policy issue or the edge router isn't online. I notice you have made two policies with the same name in your example. I expect you know that won't work so I'm just putting it out.

If there's no routers online or the rep isn't right, seeing an error like that isn't surprising.

Remote debugging like this is hard but i'm sure we'll get it eventually.

I did several reinstalls, changed the identity naming, ā€¦
Always the same issue.

Also I do not know why the https://github.com/openziti/ziti/blob/release-next/quickstart/docker/image/access-control.sh does not run during the first start of the quickstart docker setup. Some days before that worked. I need to add the commands afterwards.

I checked the commands you used with mine. For me they are the same. I believe the issue is more on the main install than on the commands.

Hmmm. Can you try a ā€œdocker compose pullā€ and make sure you have the latest version? There was some race condition last week that I think was fixed.

Itā€™s also worth using ā€œdocker compose down -vā€ dump everything between runs. I tend to do that a lot, so maybe thereā€™s something in our flows thatā€™s causing issues.

Is this the sort of thing you can put on github (or share someother way, like using zrok :slight_smile: ) so that I could try to replicate the issue youā€™re seeing?

Yes. I was on an old Version. Did ā€œdocker-compose down -vā€, delete ziti-fs.

Now I canā€™t enroll on the client. And get following error message on the controller.

ziti-controller_1   | [  12.242]   ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:1280]: {error=[local error: tls: bad record MAC]} handshake failed
ziti-controller_1   | [  12.647]   ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:1280]: {error=[local error: tls: bad record MAC]} handshake failed

The edge-router is connected. I can reach port 1280 from the client but get following on the client

Aug 21 20:17:21 grafana-ziti ziti-edge-tunnel.sh[128099]: (128099)[        0.000]    INFO ziti-sdk:utils.c:188 ziti_log_set_level() set log level: root=3/INFO
Aug 21 20:17:21 grafana-ziti ziti-edge-tunnel.sh[128099]: (128099)[        0.000]    INFO ziti-sdk:utils.c:188 ziti_log_set_level() set log level: root=3/INFO
Aug 21 20:17:21 grafana-ziti ziti-edge-tunnel.sh[128099]: (128099)[        0.000]    INFO ziti-sdk:ziti_enroll.c:90 ziti_enroll() Ziti C SDK version 0.33.4 @27bac90(HEAD) starting enrollment at (202>
Aug 21 20:17:21 grafana-ziti ziti-edge-tunnel.sh[128099]: (128099)[        0.076]   ERROR ziti-sdk:ziti_ctrl.c:154 ctrl_resp_cb() ctrl[ziti.test] request failed: -103(software caused connection a>
Aug 21 20:17:21 grafana-ziti ziti-edge-tunnel.sh[128099]: (128099)[        0.076]   ERROR ziti-sdk:ziti_enroll.c:234 enroll_cb() failed to enroll with controller: https://ziti.test:1280 CONTROLLE>
Aug 21 20:17:21 grafana-ziti ziti-edge-tunnel.sh[128099]: (128099)[        0.076]   ERROR ziti-edge-tunnel:ziti-edge-tunnel.c:2137 enroll_cb() enrollment failed: CONTROLLER_UNAVAILABLE(-3)
A

Maybe I should try tomorrow with an older Version.

Oh. Iā€™m sorryā€¦ I think I fixed this bug just today but I forgot to publish the container images. Iā€™ll do that now.

Ok, I kicked off the publish job. This was an issue with the latest tunnelers and our quickstart. It should publish soon. Youā€™ll have to docker pull again. Iā€™m out of pocket for a while but Iā€™ll check back in later if you follow up. Cheers

Two servers with ubuntu 20.04 are connected. Four servers with Ubuntu 22.04 getting

Aug 22 07:37:16 vmi1397946.contaboserver.net ziti-edge-tunnel.sh[321733]: /opt/openziti/bin/ziti-edge-tunnel: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory
Aug 22 07:37:16 vmi1397946.contaboserver.net ziti-edge-tunnel.sh[321729]: ERROR: failed to enroll server.ziti.jwt in /opt/openziti/etc/identities
Aug 22 07:37:16 vmi1397946.contaboserver.net systemd[1]: ziti-edge-tunnel.service: Control process exited, code=exited, status=1/FAILURE

On ubuntu 20.04 Iā€™m running Version v0.22.5-local of ziti-edge-tunnel
On ubuntu 22.04

/opt/openziti/bin/ziti-edge-tunnel version
/opt/openziti/bin/ziti-edge-tunnel: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory
sudo apt show ziti-edge-tunnel
Package: ziti-edge-tunnel
Version: 0.22.5

Looks like the new version does not work correctly on Ubuntu 22.04.3 LTS

After rolling back to the latest available apt release 0.21.5 the tunnel is up.

The script https://github.com/openziti/ziti/blob/release-next/quickstart/docker/image/access-control.sh was not executed. I had to add the router policies by myself.

When I try to connect I now get the error message on the edge router. So looks like the client connection to the controller was fixed. but not to the router.

ziti-edge-router_1  | [2753.730]   ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:3022]: {error=[local error: tls: bad record MAC]} handshake failed

@Metz I apologize, itā€™s been a bumpy week for the quickstart, the changes to openssl highlighted a couple of issues that we/I didnā€™t know about. I think they are all sorted now though. I had to go update zssh as well, which I have done. There is a new 0.0.17 out there, youā€™ll likely want to use that.

You should be able to docker compose pull and get:

docker image ls | grep quick
openziti/quickstart                          latest            002a3ba6cbc2   About an hour ago   308MB

Also with respect to the tunneler running on Ubuntu 22. How are you installing the tunneler, are you reinstalling it? We recently changed some things in there as well to use a new script that should detect your OS and install the proper version. You didnā€™t upgrade the OS on those machines recently have you? We had an issue where a user had ubu 20 installed, updated the OS to 22 and had to run pt --reinstall install ziti-edge-tunnel due to how ubuntu tracks packages. That led @qrkourier to file this issue to track vend a DEB repo management package? Ā· Issue #713 Ā· openziti/ziti-tunnel-sdk-c Ā· GitHub

The quickstart has been updated, zssh has also been updated. Letā€™s get to the bottom of your ziti-edge-tunnel issue next. Thanks

Make sure the ziti-edge-tunnel package repo is appropriate for your Ubuntu version. This script overwrites your package repo configuration with the correct Ubuntu LTS codename-based package repo: https://get.openziti.io/tun/scripts/install-ubuntu.bash.

Then run sudo apt-get update && sudo apt install --reinstall ziti-edge-tunnel.

@qrkourier yes. I used the correct package repo.

- name: Add ziti apt key by id from keyserver
  shell: "{{ item }}"
  with_items:
    - curl -sSLf https://get.openziti.io/tun/package-repos.gpg | sudo gpg --dearmor --output /usr/share/keyrings/openziti.gpg
    - sudo chmod 644 /usr/share/keyrings/openziti.gpg

- name: ziti repository | apt source
  ansible.builtin.apt_repository:
    repo: "deb [signed-by=/usr/share/keyrings/openziti.gpg] https://packages.openziti.org/zitipax-openziti-deb-stable focal main"
    state: present

- name: Update APT package cache
  apt: update_cache=yes cache_valid_time=3600

- name: Install packages for Ziti Edgde Tunnel
  apt:
    state: latest
    name:
    - ziti-edge-tunnel
sudo apt update
Hit:8 https://packages.openziti.org/zitipax-openziti-deb-stable focal InRelease
sudo apt show ziti-edge-tunnel -a
Package: ziti-edge-tunnel
Version: 0.22.5
Priority: optional
Section: devel
Maintainer: support@netfoundry.io
Installed-Size: 4,898 kB
Depends: debconf, iproute2, sed, systemd, libatomic1, libssl3 | libssl1.1 | libssl1.0.0, login, passwd, policykit-1, zlib1g
Download-Size: 1,773 kB
APT-Sources: https://packages.openziti.org/zitipax-openziti-deb-stable focal/main amd64 Packages
Description: ziti-tunnel-sdk-c built using CMake

Package: ziti-edge-tunnel
Version: 0.21.5
Priority: optional
Section: devel
Maintainer: support@netfoundry.io
Installed-Size: 5,362 kB
Depends: debconf, iproute2, sed, systemd, libatomic1, libssl3 | libssl1.1 | libssl1.0.0, login, passwd, policykit-1, zlib1g
Download-Size: 1,984 kB
APT-Manual-Installed: yes
APT-Sources: https://packages.openziti.org/zitipax-openziti-deb-stable focal/main amd64 Packages
Description: ziti-tunnel-sdk-c built using CMake

0.21.5 is working. 0.22.5 is not working and 0.22.0 - 0.22.4 is not available.

@TheLumberjack all 4 servers are new installations around one month ago. They are running on two different cloud providers and itā€™s their standard image.

Do you have different playbooks for focal and jammy?

If Iā€™m following this correctly it looks like youā€™re pulling from focal on your 20.04 hosts (which should be pulling focal packages), and also your 22.04 hosts (which need to pull from jammy)?

The binaries in the .deb (and .rpm) packages are built to use the shared libraries that are available on the target distro. 20.04 provides libssl1.1 and 22.04 provides libssl3.

btw the ziti-edge-tunnel binaries that are available from our GitHub releases list are linked with static libssl libraries (version 3) for a single ā€œlinuxā€ binaries that has broad distro compatibility.

1 Like

@scareything
You are right. I missed that. I need to change the ansible script to respect the os version.

@TheLumberjack
Now the service is back up.

Prometheus can reach 3 of 5 servers. Will have a look tomorrow why the others not respond.

Look like zssh is not happy with my private certificates. Is there a way to allow self signed zertificates?

WARN 	failed to get service: no apiSession, authentication attempt failed: Post "https://ziti.intranet:1280/authenticate?method=cert": x509: certificate signed by unknown authority
FATAL	service not found: zssh
2 Likes

I just tried it out locally and of course it "works for me" :confused:. I am thinking that the json file might just be out of date? This sort of exception happens when you have a valid-looking identity file, but the network has been torn down since the PKI is regenerated each time.


So here's what I just did...

  • fetched the compose file: curl -so docker-compose.yaml https://get.openziti.io/dock/docker-compose.yml
  • fetched the default .env file: curl -so .env https://get.openziti.io/dock/.env
  • used my pi hole (my local DNS nameserver) to make ctrl.home.pi and er.home.pi so that these names are now resolvable on my local network
  • edited the .env file, and updated:
    • ZITI_PWD=admin
    • ZITI_CTRL_EDGE_ADVERTISED_ADDRESS=ctrl.home.pi
    • ZITI_CTRL_ADVERTISED_ADDRESS=ctrl.home.pi
    • ZITI_ROUTER_ADVERTISED_ADDRESS=er.home.pi
  • started the environment: docker compose up
  • logged into ziti
  • copy/pasted the zssh quickstart commands from the zssh readme
  • used curl to download a ziti-edge-tunnel v0.22.5 and then unzipped it
  • used ziti-edge-tunnel to enroll: ./ziti-edge-tunnel enroll -j zsshSvcServer.jwt -i zsshSvcServer.json
  • ran the tunneler: sudo ./ziti-edge-tunnel run -i ./zsshSvcServer.json
  • used zssh to enroll theclient: zssh enroll zsshSvcClient.jwt
  • ran zssh: zssh -d -s zsshSvc -c zsshSvcClient.json cd@zsshSvcServer -i c:\users\clint\.ssh\id_ed25519

And here's me doing all that as a gif.... (except for the editing of .env)

EDIT: seems like the gif is too big/won't play. I can record a video if you need/want but those are the command I used to make sure it's working properly (I ssh'ed from windows into my wsl instance using OpenZiti)

I got all clients for the prometheus server up and running.

Then I restarted the docker image to allow prometheus to get the metrics from port 2112. After that the console showed for managed identities that the edge router connection is not available. api session was available.

So I did a compleate reinstall (deleted ziti-fs, ...)

After that I added identities, services, service policies and the following router policies

ziti edge create edge-router-policy all-endpoints-public-routers --edge-router-roles "#public" --identity-roles "#all"
ziti edge create service-edge-router-policy all-routers-all-services --edge-router-roles "#all" --service-roles "#all"

But it's still the same. api session availabble, edge router not connected.

docker-compose logs does not show an error.

zssh shows:

ERROR	dial tcp: lookup a719c5655b1e: no such host
FATAL	error when dialing service name zssh. unable to dial service 'zssh': no edge routers connected in time

The error log from the tunnel on the prometheus server:

8858.297]    WARN ziti-sdk:connect.c:332 connect_timeout() conn[0.740/Connecting] connect timeout: no suitable edge router
8858.297]   ERROR tunnel-cbs:ziti_tunnel_cbs.c:103 on_ziti_connect() ziti dial failed: operation did not complete in time
8893.882]   ERROR ziti-sdk:channel.c:860 on_channel_connect_internal() ch[3] failed to connect to ER[ziti-edge-router] [-3001/temporary failure]
8893.882]   ERROR ziti-sdk:connect.c:281 on_channel_connected() ztx[0] ch[3] failed to connect [-3001/temporary failure]

ERROR dial tcp: lookup a719c5655b1e: no such host

This error indicates to me that one or more edge routers are clearly advertising the docker hostname value as the location to connect to the router and not some external-to-docker name. This will of course work fine for container "in/on" the docker same docker network but obviously fails when outside of it.

connect timeout: no suitable edge router

This is consistent with the hostname advertised by the routers telling you that although you have an edge router, this identity couldn't connect to it.0

How can I help you best here. Would it be best if I showed you how to use a compose file to create a network? Would you like me to look at your compose and .env files in DM so we can get to the bottom of the issue? What would help you most? I'm sure we can get through this. I could facilitate a call if that would be better/easier?

I rolled back to docker version (controller and router) v0.29.0 and ziti-edge-tunnel v0.21.5-local and the router connection is back. Also I get connectivity trough the network with the exception of prometheus see above in this threat which is resolved in 0.30.1.
This shows that my docker-compose and also the ansible scripts are working.

Looks like I have the same issue with 0.30.1 like here: I think quickstart:0.30.1 broke something in the edge-tunnel connections

Adding the new environment variable did the trick.
Version 0.30.1 up and running with working connections.

Thank you.

Great to hear. I almost commented here a link to that other thread, but I saw you had commented in there and I hoped you'd see it. :slight_smile:

Thanks for the confirmation. Does this mean you have all your targets scrapable now??? If so that's great news!

Extra "Prometheus-related" stuffs...

Last year we did a PoC with Prometheus, demonstrating how to "scrape anything anywhere". I presented a presentation at DeveloperWeek Europe if interested:

What I did was add our Golang SDK into the server itself so that you didn't need a tunneling application. It needs to be refreshed, but if you were interested in using it, I could probably bring it up to date and maybe make some upgrades along the way.

I need to republish the blog post on our new blog but you can also read about it if you like. It's a three part series

Yes all targets from the node exporter are scable now. zssh is working and also prometheus is connected to the ziti metrics.
:grinning:

Thank you for the prometheus video and the documentation. At the moment I will stay with the tunnel. Using prometheus and implementing the ziti sdk would mean that I need to do regulary updates with the new versions from prometheus. I'll move that to a later stage.

1 Like

fantastic @Metz. secure scraping is a nice use case. if you are able to tell us, what was previously securing the connection between your prometheus server and the node-exporter nodes it was scraping from, even if as simple as permitted IPs? or, if this is new, what would have been the likely alternative to securing the prometheus connection via openziti?