Helm Port Mappings

Going off of the documentation here, I want to configure my helm deployments to whatever the accepted standard is.

Would these chart values be correct?

Controller:

ctrlPlane:
  advertisedPort: 8440

clientApi:
  advertisedPort: 8441

Router:

edge:
  advertisedPort: 8442

Also, what would the benefit be of exposing the ctrlPlane publicly?

You can use port 443 for everything if you're using a LoadBalancer to route requests by SNI, and you must use 443 for everything if you're using the K8s Ingress resource.

Still using K3S? It's been a minute since I installed Ziti on that distro, but it's about time for a refresher. I'll do that. As I recall, Traefik was the default ingress controller and I only needed to specify service type LoadBalancer to get hostname routing with passthrough TLS, which is a requirement for Ziti APIs because they conduct client certificate authentication.

I am using K3S, and the default ingress is traefik.

However, my ingress is not publicly exposed.
My thought was to assign ziti resources/ports LoadBalancers, and have that be the only public-facing entity, and then privately route everything to traefik internally.

Yes, that sounds like the simplest way to go with K3S's built-in Traefik. For the controller's Helm chart input values, use port 443 for all advertisedPort properties, and ensure the advertisedHost properties are identical to a domain name that resolves to the value of the ExternalIP occupied by Traefik's ServiceLB, the LoadBalancer provider.

I wrote this "Minimal Installation" section with K3S in mind because that's what I used to explore the first few versions of the controller chart when @marvkis first contributed it.

To make sure I understand the wiring:

  • the controller has a LoadBalancer, and is exposed publicly on port 443
  • the controller specifies a host, like traefik.domain.com, which needs a DNS record that is publicly available, like: traefik.domain.com A 192.168.1.100, where 192.168.1.100 is the LoadBalancer IP address for traefik, which is internal-only

Do I have all of that correct?

The thing I am not quite seeing, is say that I have a web service handled by traefik, accessible internally via a private DNS record at service.domain.com. How will openziti resolve that DNS record to traefik?

Those two bullet points sound correct except for the "which is internal-only" part of the second one. Traefik's load balancer provider ServiceLB should assign any services in the cluster of type LoadBalancer an ExternalIP that will become visible in the "EXTERNAL-IP" column of kubectl get services output. That's the IP address to which the controller's advertisedHost domain name must resolve.

A little more colour to my configuration. All of my web services are ClusterIP only (for the most part).
Traefik has an EXTERNAL-IP, manually assigned via MetalLB.

Currently, the mechanic is to connect via a traditional VPN, which then gives access to split-horizon DNS.
This allows users to connect to traefik, which is internal-only (only accessible inside the VPN).

Addressing my first bullet point, I can give the ziti controller an EXTERNAL-IP by assigning it an IP address from MetalLB.

I am not exactly certain what to do with ziti, given that my traefik ingress is not exposed to the world.

Did I understand correctly the external IPs assigned to Traefik by MetalLB are only reachable by VPN clients, and that they resolve those IPs using a nameserver provided by the VPN?

Yes. Traefik is the only service that has an external IP address. All of the services handled by Traefik are ClusterIP. Finally, yes there is an internal/VPN provided DNS server that allows web services to be resolved.

And you want your Ziti Controller to be exposed outside the VPN so that it provides an alternative path to the cluster services?

Yes.
My thought was to have the VPN as a backup of sorts, and ziti to be a more granular control.

From what I understand, ziti provides a sort of DNS hijacking where it can intercept things like service.tld.com, so it would be able to pass things through to traefik, which would then provide the service.

@qrkourier Does that make sense?

Yes, Ziti tunnellers are apps for each OS that provide a split-horizon DNS server for resolving Ziti service addresses and a transparent proxy for intercepting the matching traffic, and you can bypass DNS entirely by embedding a Ziti SDK in the application. You'd had highly granular control through Ziti service policies that control which Ziti identities are permitted to bind or dial which Ziti services.

To recap your K8s-behind-a-VPN setup, your K8s cluster uses MetalLB to provide ExternalIp addresses to Services of type LoadBalancer, and Traefik is your Ingress controller and creates one such Service for handling Ingresses. The ExternalIp addresses provided by MetalLB to Services like Traefik are only reachable via the VPN.

You'd need a different ExternalIp provider that exposes Services outside the VPN if you want to self-host the Ziti controller separate from the VPN.

Still, if the Ziti controller and Ziti router(s) are hosted elsewhere, access to and from cluster workloads themselves can be managed with Ziti tunnellers and SDKs. It's only the Ziti controller and router(s) that need to be reachable without the VPN.

@qrkourier Wonderful, thank you!

I have made two DNS A records:

If I run the controller on port 8441, and the router on port 8443 (as it is in the guides), is that the prescribed way to do it?

@qrkourier I went ahead with that scenario :slight_smile:

When I try to install the router, I receive this error block:

[   0.003]    INFO ziti/ziti/router.run: {configFile=[/etc/ziti/config/ziti-router.yaml] version=[v0.29.0] go-version=[go1.20.5] os=[linux] arch=[amd64] build-date=[2023-07-13T15:53:37Z] routerId=[Q86P4Mz8c] revision=[3ca2dd2f4e7b]} starting ziti-router
[   0.003]    INFO fabric/metrics.GoroutinesPoolMetricsConfigF.func1.1: {minWorkers=[0] maxWorkers=[32] idleTime=[30s] maxQueueSize=[1000] poolType=[pool.link.dialer]} starting goroutine pool
[   0.003]    INFO fabric/router/forwarder.(*Faulter).run: started
[   0.003]    INFO fabric/metrics.GoroutinesPoolMetricsConfigF.func1.1: {idleTime=[30s] poolType=[pool.route.handler] minWorkers=[0] maxWorkers=[128] maxQueueSize=[1000]} starting goroutine pool
[   0.003]    INFO fabric/router/forwarder.(*Scanner).run: started
[   0.003] WARNING edge/router/internal/edgerouter.(*Config).LoadConfigFromMap: Invalid heartbeat interval [0] (min: 60, max: 10), setting to default [60]
[   0.003] WARNING edge/router/internal/edgerouter.parseEdgeListenerOptions: port in [listeners[0].options.advertise] must equal port in [listeners[0].address] for edge binding but did not. Got [443] [3022]
[   0.003]    INFO fabric/router.(*Router).showOptions: ctrl = {"OutQueueSize":4,"MaxQueuedConnects":1,"MaxOutstandingConnects":16,"ConnectTimeout":5000000000,"DelayRxStart":false,"WriteTimeout":0}
[   0.003]    INFO fabric/router.(*Router).showOptions: metrics = {"ReportInterval":60000000000,"IntervalAgeThreshold":0,"MessageQueueSize":10}
[   0.003]    INFO fabric/metrics.GoroutinesPoolMetricsConfigF.func1.1: {minWorkers=[0] maxWorkers=[32] poolType=[pool.link.dialer] idleTime=[30s] maxQueueSize=[5000]} starting goroutine pool
[   0.003]    INFO fabric/router.(*Router).initializeHealthChecks: starting health check with ctrl ping initially after 15s, then every 30s, timing out after 15s
[   0.003]    INFO fabric/router.(*Router).startXlinkDialers: started Xlink dialer with binding [transport]
[   0.003]    INFO fabric/metrics.GoroutinesPoolMetricsConfigF.func1.1: {idleTime=[10s] maxQueueSize=[1] poolType=[pool.listener.link] minWorkers=[1] maxWorkers=[16]} starting goroutine pool
[   0.004]    INFO fabric/router.(*Router).startXlinkListeners: started Xlink listener with binding [transport] advertising [tls:ziti-router.domain.com:8443]
[   0.004]    INFO edge/router/xgress_edge.(*listener).Listen: {address=[tls:0.0.0.0:3022]} starting channel listener
[   0.004]    INFO fabric/metrics.GoroutinesPoolMetricsConfigF.func1.1: {poolType=[pool.listener.xgress_edge] minWorkers=[1] idleTime=[10s] maxQueueSize=[1] maxWorkers=[16]} starting goroutine pool
[   0.004]    INFO fabric/router.(*Router).startXgressListeners: created xgress listener [edge] at [tls:0.0.0.0:3022]
[   0.004]    INFO fabric/router.(*Router).startXgressListeners: created xgress listener [tunnel] at []
[   0.004]    INFO fabric/router.(*Router).getInitialCtrlEndpoints: controller endpoints file [/etc/ziti/config/endpoints] doesn't exist. Using initial endpoints from config
[   0.004]    INFO fabric/router.(*Router).startControlPlane: router configured with 1 controller endpoints
[   0.004]    INFO edge/router/xgress_edge.(*Acceptor).Run: starting
[   0.004]    INFO fabric/router/env.(*networkControllers).UpdateControllerEndpoints: {endpoint=[map[tls:ziti-controller.domain.com:8441:{}]]} adding new ctrl endpoint
[   0.004]    INFO fabric/router/env.(*networkControllers).connectToControllerWithBackoff: {endpoint=[tls:ziti-controller.domain.com:8441]} starting connection attempts
[   0.004]    INFO edge/router/fabric.(*StateManagerImpl).StartHeartbeat: heartbeat starting
[   0.004]    INFO edge/router/xgress_edge_tunnel.(*tunneler).Start: {mode=[host]} creating interceptor
[   0.004]    INFO edge/router/xgress_edge.(*CertExpirationChecker).Run: waiting 8615h55m45.745529914s to renew certificates
[   0.004] WARNING edge/tunnel/dns.flushDnsCaches: {error=[exec: "resolvectl": executable file not found in $PATH]} unable to find systemd-resolve or resolvectl in path, consider adding a dns flush to your restart process
[   0.089]   ERROR fabric/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {endpoint=[tls:ziti-controller.domain.com:8441] error=[error connecting ctrl (could not negotiate connection with PUBLIC.IP.ADDRESS.HERE:8441, invalid header)]} unable to connect controller
[   0.174]   ERROR fabric/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {endpoint=[tls:ziti-controller.domain.com:8441] error=[error connecting ctrl (could not negotiate connection with PUBLIC.IP.ADDRESS.HERE:8441, invalid header)]} unable to connect controller
[   0.355]   ERROR fabric/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {error=[error connecting ctrl (could not negotiate connection with PUBLIC.IP.ADDRESS.HERE:8441, invalid header)] endpoint=[tls:ziti-controller.domain.com:8441]} unable to connect controller
[   0.370]    INFO edge/tunnel/intercept.SetDnsInterceptIpRange: dns intercept IP range: 100.64.0.1 - 100.127.255.254
[   0.481]   ERROR fabric/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {error=[error connecting ctrl (could not negotiate connection with PUBLIC.IP.ADDRESS.HERE:8441, invalid header)] endpoint=[tls:ziti-controller.domain.com:8441]} unable to connect controller
[   0.648]   ERROR fabric/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {endpoint=[tls:ziti-controller.domain.com:8441] error=[error connecting ctrl (could not negotiate connection with PUBLIC.IP.ADDRESS.HERE:8441, invalid header)]} unable to connect controller
[   0.868]   ERROR fabric/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {endpoint=[tls:ziti-controller.domain.com:8441] error=[error connecting ctrl (could not negotiate connection with PUBLIC.IP.ADDRESS.HERE:8441, invalid header)]} unable to connect controller
[   1.269]   ERROR fabric/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {endpoint=[tls:ziti-controller.domain.com:8441] error=[error connecting ctrl (could not negotiate connection with PUBLIC.IP.ADDRESS.HERE:8441, invalid header)]} unable to connect controller
[   1.823]   ERROR fabric/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {endpoint=[tls:ziti-controller.domain.com:8441] error=[error connecting ctrl (could not negotiate connection with PUBLIC.IP.ADDRESS.HERE:8441, invalid header)]} unable to connect controller
[   3.112]   ERROR fabric/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {endpoint=[tls:ziti-controller.domain.com:8441] error=[error connecting ctrl (could not negotiate connection with PUBLIC.IP.ADDRESS.HERE:8441, invalid header)]} unable to connect controller
[   4.419]   ERROR fabric/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {error=[error connecting ctrl (could not negotiate connection with PUBLIC.IP.ADDRESS.HERE:8441, invalid header)] endpoint=[tls:ziti-controller.domain.com:8441]} unable to connect controller
[   5.972]   ERROR fabric/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {endpoint=[tls:ziti-controller.domain.com:8441] error=[error connecting ctrl (could not negotiate connection with PUBLIC.IP.ADDRESS.HERE:8441, invalid header)]} unable to connect controller
[   9.406]   ERROR fabric/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {endpoint=[tls:ziti-controller.domain.com:8441] error=[error connecting ctrl (could not negotiate connection with PUBLIC.IP.ADDRESS.HERE:8441, invalid header)]} unable to connect controller
[  14.568]   ERROR fabric/router/env.(*networkControllers).connectToControllerWithBackoff.func2: {endpoint=[tls:ziti-controller.domain.com:8441] error=[error connecting ctrl (could not negotiate connection with PUBLIC.IP.ADDRESS.HERE:8441, invalid header)]} unable to connect controller
[  15.371]   FATAL fabric/router.(*Router).startControlPlane.func1: unable to connect to any controllers before timeout

Do you know why the router complains about an invalid header?

Do you have a Traefik Ingress for the controller port 8441? Ingress is typically only 443, but Traefik may allow non-standard ports. The port scheme used in the non-k8s quickstart is not applicable to k8s, and there's no reason to use distinct ports if you're using Ingress or LoadBalancer because the domain name distinguishes the service, so the standard port 443 is typically best.

Is the router running in the same K8s cluster as the controller?

Is the controller reachable on the public IP and port 8441? You can verify the controller URL is valid with curl.

I do not have 8441 assigned to traefik.

I have the standard websecure configuration:

  websecure:
    port: 8443
    expose: true
    exposedPort: 443
    protocol: TCP

I have the controller and the router running in the same cluster, both in the ziti namespace.

Running curl returns this:

{"data":{"apiVersions":{"edge":{"v1":{"apiBaseUrls":["https://ziti-controller.domain.com:8441/edge/client/v1"],"path":"/edge/client/v1"}},"edge-client":{"v1":{"apiBaseUrls":["https://ziti-controller.domain.com:8441/edge/client/v1"],"path":"/edge/client/v1"}},"edge-management":{"v1":{"apiBaseUrls":["https://ziti-controller.domain.com:8441/edge/management/v1"],"path":"/edge/management/v1"}}},"buildDate":"2023-07-13T15:53:37Z","revision":"3ca2dd2f4e7b","runtimeVersion":"go1.20.5","version":"v0.29.0"},"meta":{}}

The thing about using port 443 is what I want to avoid. Currently, using a traditional VPN, there is no access whatsoever to traefik unless a user is connected.
I want the same behaviour with ziti, where unless a user is connected and has an allow rule assigned to them, there is no access.

That's a good result from the cURL test on port 8441. Is the VPN required to obtain that result with cURL?

I want the same behaviour with ziti

The same behavior can be achieved with Ziti, but the controller's port 8441 and router's port 8443 must be reachable by the Ziti identities without Ziti or the VPN. Those hardened listeners must be on the untrusted regular network, not a protected network.

I'm assuming you still want to be able to have two options for sheltering the Kubernetes workloads: VPN or Ziti, but not nesting Ziti inside the VPN, correct?

The VPN is not required to reach the controller.
I actually verified with an external curl just to be sure.
It is exposed to the world.

You have it correct :slight_smile: I want two separate options, not nested.

Thank you for clarifying there's no nesting of Ziti inside the VPN.

You verified the controller is reachable without the VPN on port 8441 and obtained a normal greeting from the root / API endpoint. However, when the router attempts to connect to the controller's ctrl.endpoint (the address of the control plane) it times out.

You clarified they're both running in the same K8s cluster, so the router may be configured to reach the controller's ctrl.endpoint via the cluster IP service, e.g. {{ controller's ctrl service name }}.{{ controller's namespace }}.svc.cluster.local.

Here's an example Helm install command for the router that specifies a cluster IP service for the ctrl.endpoint (from the miniziti.bash script). I've disabled the link listener in this example because I believe you have one router. If you add a second router then you need at least one of them to have a reachable link listener. That's how they form the Ziti mesh fabric.

    helm upgrade --install "ziti-router" "openziti/ziti-router" \
        --namespace "ziti" \
        --set-file enrollmentJwt="/tmp/router.jwt" \
        --set edge.advertisedHost="ziti-router.domain.com" \
        --set linkListeners.transport.service.enabled="false" \
        --set ctrl.endpoint="ziti-controller-ctrl.ziti.svc:443"

You can see find the K8s service named like "ctrl" by selecting the controller's K8s services.

$ kubectl get svc --selector=app.kubernetes.io/name=ziti-controller
NAME                     TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
ziti-controller-client   ClusterIP   10.97.46.68    <none>        443/TCP   15m
ziti-controller-ctrl     ClusterIP   10.97.235.15   <none>        443/TCP   15m

Even before re-running the helm upgrade command, you can inspect the current value of the router's ctrl.endpoint to ensure it is reachable by the router.

$ helm get values ziti-router
USER-SUPPLIED VALUES:
ctrl:
  endpoint: ziti-controller-ctrl.miniziti.svc:443