Helm Port Mappings

Sure thing!

It does, unfortunately:

kubectl get pods --selector app.kubernetes.io/component=ziti-controller --output jsonpath="{.items[0].metadata.name}" -n ziti | xargs -IPOD kubectl -n ziti exec POD --container ziti-controller -- zitiLogin

error: unable to authenticate to https://ziti-controller-client.ziti.svc.cluster.local:8441/edge/management/v1. Status code: 401 Unauthorized, Server returned: {
    "error": {
        "code": "INVALID_AUTH",
        "message": "The authentication request failed",
        "requestId": "GFEcwHV4-"
    },
    "meta": {
        "apiEnrollmentVersion": "0.0.1",
        "apiVersion": "0.0.1"
    }
}
command terminated with exit code 1

I did shell in and verify that the admin password was correctly set in ZITI_ADMIN_PASSWORD.

For your reference, this is the relevant section of the values.yaml for the controller chart installation:

ctrlPlane:
  containerPort: 6262
  advertisedHost: ziti-controller.domain.com
  advertisedPort: 8440
  service:
    enabled: true
    type: LoadBalancer  # this can be cluster-internal unless there are routers outside the cluster
    annotations:
      metallb.universe.tf/address-pool: ziti-controller
  ingress:
    enabled: false
    annotations:
  alternativeIssuer:
  dnsNames: []

clientApi:
  containerPort: 1280
  advertisedHost: ziti-controller.domain.com
  advertisedPort: 8441
  service:
    enabled: true
    type: LoadBalancer  # this can be cluster-internal unless there are routers outside the cluster
    annotations:
      metallb.universe.tf/address-pool: ziti-controller-client
  ingress:
    enabled: false
    annotations:
  dnsNames: []

When I try to do a ziti edge login, the controller logs show this:

{"_context":"tls:0.0.0.0:1280","error":"remote error: tls: bad certificate","file":"github.com/openziti/transport/v2@v2.0.143/tls/listener.go:257","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"192.168.1.1:37068","time":"2024-10-21T22:38:57.244Z"}

I did notice when I pull up https://ziti-controller.domain.com:8441/zac

Is this something to do with a self-signed certificate?

This is a misleading message because it's not really an error, but expected. Here's the GH issue for improving or eliminating the log message.

What did you notice when you pulled up that address?

Pardon, apparently that did not finish.

I noticed that it does not have an "authentic" certificate.

In the pod, if I try to use curl to verify any endpoints, it needs the -k flag.

I was wondering if that is causing the error.

You may configure an alternative server certificate for the console. I don't believe it's relevant to your login issue. If you run into problems with the newly supported alternative certificates feature in the ziti-controller chart, post a new topic here in Discourse.

Regarding certificate verification for management operations with the CLI, including ziti edge login: the CLI trusts the default management API certificate through the trust-on-first-use (TOFU) mechanism built into ziti edge login by prompting the user to accept the certificate and storing it in ~/config/ziti/. Alternatively to TOFU, the root CA for the management API's certificate can be explicitly trusted with the ziti edge login --ca option.


It's unclear why the stored password isn't valid. I'm not aware of any conditions that could lead to this situation. Maybe an incorrect database snapshot was restored?

If you have only one admin identity and that identity has only password authentication enabled and you do not have the correct password then your only option is to perform a database recovery operation after adding a new admin user. Here's the procedure. It's long and, hopefully, precise enough. If this works well for you I'll develop this draft as documentation for the ops guides.

  1. Obtain a database snapshot.

    kubectl get pods --selector app.kubernetes.io/name=ziti-controller --output jsonpath="{.items[0].metadata.name}" \
    | xargs -IPOD kubectl --container ziti-controller exec POD -- \
    ziti agent controller snapshot-db
    
    /persistent/ctrl.db-20241022-154414
    
  2. Add a new identity with admin privilege, substituting your desired username for ZITI_ADMIN_USER and the desired password for ZITI_ADMIN_PASSWORD. This username must be unique. Optionally, you may later use the new admin's credential to reset the default admin's password.

    kubectl get pods --selector app.kubernetes.io/name=ziti-controller --output jsonpath="{.items[0].metadata.name}" \
    | xargs -IPOD kubectl -c ziti-controller exec POD -- \
    ziti ops db add-debug-admin /persistent/ctrl.db-20241022-154414 ZITI_ADMIN_USER ZITI_ADMIN_PASSWORD
    
    {"file":"github.com/openziti/storage@v0.3.2/boltz/migration.go:99","func":"github.com/openziti/ziti/controller/db.RunMigrations.(*migrationManager).Migrate.func1","level":"info","msg":"edge datastore is up to date at version 37","time":"2024-10-22T15:46:59.792Z"}
    
    added debug admin with username 'admin2'
    
  3. Restore the database snapshot.

    Scale the ziti-controller deployment to zero.

    kubectl scale deployment --selector app.kubernetes.io/name=ziti-controller --replicas=0 
    
    deployment.apps/ziti-controller scaled
    

    Create a Job resource mounting the controller's PVC. NOTE: substitute the correct namespace metadatum for MY_ZITI_NAMESPACE.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: ziti-controller-maintenance
      namespace: MY_ZITI_NAMESPACE
    spec:
      template:
        spec:
          restartPolicy: Never
          containers:
          - name: ziti-controller-maintenance
            image: busybox
            command: ["/bin/sh"]
            args: ["-c", "sleep 3600"]
            volumeMounts:
            - mountPath: /persistent
              name: ziti-controller-persistence
          volumes:
          - name: ziti-controller-persistence
            persistentVolumeClaim:
              claimName: ziti-controller
    

    Restore the snapshot you modified earlier.

    kubectl get pods --selector=batch.kubernetes.io/job-name=ziti-controller-maintenance --output jsonpath="{.items[0].metadata.name}" \
    | xargs -IPOD kubectl exec POD -- \
    cp -v /persistent/ctrl.db{-20241022-154414,}
    
    '/persistent/ctrl.db-20241022-154414' -> '/persistent/ctrl.db'
    

    Set filemode on restored snapshot.

    kubectl get pods --selector=batch.kubernetes.io/job-name=ziti-controller-maintenance --outp
    ut jsonpath="{.items[0].metadata.name}" \
    | xargs -IPOD kubectl exec POD -- chown -c 2171:2171 /persistent/ctrl.db
    
    changed ownership of '/persistent/ctrl.db' to 2171:2171                                                                                       
    

    Delete the maintenance job to ensure the PVC is available.

    kubectl delete -f /tmp/busybox.yml
    
    job.batch "ziti-controller-maintainence" deleted
    

    Scale up the controller deployment.

    kubectl scale deployment --selector app.kubernetes.io/name=ziti-controller --replicas=1
    
    deployment.apps/ziti-controller scaled
    

Apparently something was off with the PVC, because I uninstalled everything, deleted the PVC, reinstalled everything, and all is well!

Circling back to your suggestion of installing the ziti-host helm chart.
I did that, and I was able to connect to a test nginx instance :slight_smile:

With this, "If this works then it points toward a configuration problem with the Router, or possibly a bug.", what are you thinking the issue is?

Onto more fun things, now I need to know how to route traffic to traefik :slight_smile:

I tried creating a simple service, but it seems as though it is not passing through the DNS name.

How can I create a service that takes service.domain.com and passes that through to a local IP address like 192.168.1.100?

You initialized a new controller database with a new default admin password, right?

That confirms the Ziti network is functioning, but the router is not hosting the assigned service for some reason.

A quick review of the thread reminds me that we confirmed the service edge router policy (SERP) was permissive (#all/#all), but some time has passed and the router error you reported "failed to dial fabric" is precisely the error I expect to see if the hosting or dialing router's tunnler has not been granted permission to use the service by a SERP.

The initial symptom was "no terminators" and that's been solved by deploying a second tunneler with permission to host/bind the service, but the router may still be logging "failed to dial fabric," which would allow us to investigate the root cause. A router's tunneler certainly will host a service if it has permission!

Things to check:

  1. SERP is all/all
  2. policy-advisor (as before) shows the router's tunneler has bind permission
  3. the router's tunneler was enabled when it was created or subsequently updated to enable the tunneler and it has a tunnel binding mode "host" in its configuration

Here's an overview of creating a Ziti service. You wish to map a fictitious domain name to a real IP address, so I assume you will use a Ziti tunneler on both ends: intercept (client) and host (server).

  1. Create an intercept.v1 config. This defines the fictitious domain names that will exist in the client tunneler's nameserver.
  2. Create a host.v1 config. This defines the address of the target server as a domain name or IP address, port, protocol.
  3. Create a service with the two configs above with a role like "acme-services."
  4. Create a Dial service policy with roles like "acme-clients"/"acme-services." This grants client tunneler identities permission to dial the service.
  5. Create a Bind service policy with roles like "acme-hosts"/"acme-services." This grants hosting tunneler identities permission to bind (host) the service.
  6. Create some client identities with role "acme-clients."
  7. Create some hosting identities with role "acme-hosts."
  8. Deploy the client identities on devices where "service.domain.com" should be intercepted.
  9. Deploy the hosting identities on devices that have access to the target address from the host.v1 config.

You can extend this pattern by adding more services with role "acme-services" if the service should be dialed and hosted by the same set of identities.

Link to doc for more details

I have working services! Thank you!

I did notice that when making a video call with element/matrix, the ziti-host pod shows these errors, and the call fails to connect:

(7)[      644.696]    WARN ziti-sdk:conn_bridge.c:284 on_ziti_data() br[0.37] write failed: -32(broken pipe)
(7)[      644.696]    WARN ziti-sdk:connect.c:820 flush_to_client() conn[0.37/yGV2wERR/CloseWrite] client indicated error[-32] accepting data (0 bytes buffered)
(7)[      645.078]    WARN ziti-sdk:conn_bridge.c:284 on_ziti_data() br[0.38] write failed: -32(broken pipe)
(7)[      645.078]    WARN ziti-sdk:connect.c:820 flush_to_client() conn[0.38/T9KXpdp_/CloseWrite] client indicated error[-32] accepting data (0 bytes buffered)
(7)[      645.078]    WARN ziti-sdk:conn_bridge.c:284 on_ziti_data() br[0.39] write failed: -32(broken pipe)
(7)[      645.078]    WARN ziti-sdk:connect.c:820 flush_to_client() conn[0.39/3d8U5Lt9/Closed] client indicated error[-32] accepting data (0 bytes buffered)

Yes, the admin password was re-initialised on startup. Apparently something was locked on.

With debugging the router, going step by step:

ziti edge list service-edge-router-policies

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ ID                     โ”‚ NAME        โ”‚ SERVICE ROLES โ”‚ EDGE ROUTER ROLES โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ UUID-goes-here         โ”‚ all-routers โ”‚ #all          โ”‚ #all              โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
results: 1-1 of 1

ziti edge policy-advisor identities --quiet edge-router

ERROR: edge-router 
  - Identity does not have access to any services. Adjust service policies.

I have this for the router's tunneler section in the values file: helm-charts/charts/ziti-router/values.yaml at main ยท openziti/helm-charts ยท GitHub

tunnel:
  # -- run mode for the router's built-in tunnel component: host, tproxy, proxy, or none
  mode: none

Should mode be set to host there?
Also, if so, why is the default in the values file none?

Run ziti edge update identity "edge-router" to grant whichever role you have granted the ziti-host identity from the bind service policy

It should be host to enable the router's tunneler to host services. It was default none because the router panics and won't start if not tunneler enabled when created. You can enable it later with ziti edge update edge-router "edge-router" --tunneler-enabled.

Ah.
Swapping that value from none to host allowed everything to work with the default router :slight_smile:

I think I figured out why video and voice calls are failing in matrix, but I do not have a solution to it yet.

Matrix does 1:1 calls peer-to-peer so they are end-to-end encrypted.

With a traditional VPN, both peers are on the same network, so they can connect.
Given that matrix is isolated with OpenZiti, the peer-to-peer connection fails.

Do you have any thoughts on how to remedy this?

Is this an example of the feature you want to use via OpenZiti?

Eventually, yes.
Unfortunately, Element Call is still experimental.

Currently with matrix, there are two methods for calls:

  • 1:1/peer-to-peer encrypted calls
  • group calls via jitsi

I am trying to have the functionality of the 1:1 calls, which is handled via webrtc.

I played with Jitsi meet and zrok and found that it functions over a single TCP port (link to blog post with example), so it's doable with a zrok share (which uses OpenZiti under the hood), but I didn't explore how well it would perform with more than two parties or other things that would strain the resources.

We really need to understand the P2P requirements in terms of directional network flows, e.g., proto://src_host:src_port => proto://dst_host:dst_port, like a firewall rule.

Once we know that we can probably identify a combination of Ziti service configs that will work.

Are you mainly trying to get the 1:1 Matrix calls working vs. group/conference?

I did some research and found this comment: Add a section about calls in the docs ยท Issue #614 ยท matrix-org/matrix.org ยท GitHub

Which includes this ASCII diagram: https://raw.githubusercontent.com/vector-im/element-android/develop/docs/voip_signaling.md

Yes, currently I am just trying to get the 1:1 calls to function.

Also, I think I may have discovered a bug.

On android (GrapheneOS), when connected to Ziti Mobile Edge, using Private DNS fails.
I discovered a similar bug with tailscale here: Tailscale breaks DNS-over-TLS on Android ยท Issue #915 ยท tailscale/tailscale ยท GitHub

I tested it with dns.adguard-dns.com.

If I disconnect from openziti, DNS-Over-TLS succeeds.
For more information, I do not have "Block connections without VPN" enabled.

One more thing :slight_smile:

I tried splitting the management service by setting managementApi.service.enabled to true in the controller chart.

I configured traefik as the ingressClassName, and set the managementApi.advertisedHost as well.

If I leave the advertisedPort as 443, the controller logs show this when I try to access the console (at https://zac.domain.com):

{"_context":"tls:0.0.0.0:1281","error":"remote error: tls: bad certificate","file":"github.com/openziti/transport/v2@v2.0.143/tls/listener.go:257","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"10.0.0.160:43488","time":"2024-10-30T02:23:57.483Z"}

If I change the advertisedPort to 80, the logs show this:

{"_context":"tls:0.0.0.0:1281","error":"tls: first record does not look like a TLS handshake","file":"github.com/openziti/transport/v2@v2.0.143/tls/listener.go:257","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"10.0.0.160:47994","time":"2024-10-30T02:25:13.615Z"}

I see three alternatives.

  1. Learn how Matrix 1:1 calls perform client exchange and write Ziti service configs that match. For example, if Matrix assumes the client's private IP address is reachable by peers, and presents that address to the other party to connect the WebRTC stream, then a Ziti service intercept config is needed for each party's private IP to make them both reachable by Ziti peers.
  2. Configure Matrix to use a Ziti intercept as client ID. This may require modifying or extending the Matrix software if it's not configurable.
  3. Host a TURN server to relay the WebRTC stream. Either the TURN server is public or reachable via Ziti for both parties.

As you expected, this boolean triggers logic in the controller chart to split the client and management APIs to separate web listeners so you can control access. By default, the management operations are available in the client API's web listener.

Your input values should resemble these:

managementApi:
  advertisedHost: zac.domain.com  # any domain name that resolves to Traefik external IP; distinct from clientApi
  advertisedPort: 443  # where clients will connect to Traefik Ingress
  containerPort: 1281  # any available container port; distinct from clientApi
  ingress:
    annotations: {}  # any annotations required to configure Traefik Ingress controller for passthrough TLS
    enabled: true
    ingressClassName: traefik
  service:
    enabled: true
    type: ClusterIP

The URL for the console with the advertised host and port you configured is printed when you upgrade the Helm release with the new input values.

โฏ helm -n ziti upgrade ziti-controller openziti/ziti-controller --values ziti-controller.yaml
Release "ziti-controller" has been upgraded. Happy Helming!
NAME: ziti-controller
LAST DEPLOYED: Wed Oct 30 11:47:38 2024
NAMESPACE: ziti
STATUS: deployed
REVISION: 10
TEST SUITE: None
NOTES:
Your release ziti-controller was upgraded.


You have chart version 1.1.1 and app version 1.1.15.

To learn more about the release, try:

  $ helm status ziti-controller -n ziti
  $ helm get all ziti-controller -n ziti

This deployment provides an OpenZiti controller to manage an OpenZiti network.

Visit the console in a web browser: https://zac.domain.com:443/zac/

You can print this information at any time.

helm status ziti-controller -n ziti

I had a feeling I would need to host a TURN server.
I will look into that, thank you :slight_smile:

That is basically what I have, but the console is not reachable through traefik, and it errors with the logs I pasted above.

I also tried doing a kubectl port-forward directly to the pod, and it closes the connection.