Issue: JWT Validation Fails for Edge Router Without Helm

I am encountering an issue with the JWT validation process when deploying an Edge Router using Kubernetes manifests instead of Helm.

  1. Using Helm:
    The Edge Router is successfully created and registered in GKE when using the following Helm command:
helm upgrade --install nf-edge-router-ztna-test \
--namespace ztna-configconnector \
openziti/ziti-router \
--set-file enrollmentJwt=/Users/unixjon/test-prod-from-code.jwt \
--set linkListeners.transport.service.enabled=false \
--set edge.advertisedHost=nf-edge-router-ztna-test-edge.test.svc.cluster.local \
--set ctrl.endpoint="XXXXXXX-XXX-XXXX-XXXX-XXXX.XXXX.io:443" \
--values /Users/unixjon/router.yml

Using Manifests:
However, when I extract the manifests and pass the same JWT (/Users/unixjon/test-prod-from-code.jwt) as part of the deployment, the enrollment process fails with the following error:

{"file":"github.com/openziti/ziti/ziti/router/enrollgw.go:68","func":"github.com/openziti/ziti/ziti/router.enrollGw","level":"fatal","msg":"enrollment failure: (enrollment failed received HTTP status [400 Bad Request]: {\"error\":{\"code\":\"INVALID_ENROLLMENT_TOKEN\",\"message\":\"The supplied token is not valid\",\"requestId\":\"ohHMd5PQX\"},\"meta\":{\"apiEnrollmentVersion\":\"0.0.1\",\"apiVersion\":\"0.0.1\"}}\n)","time":"2024-12-07T15:01:41.890Z"}

Additional Context

  • The JWT used in both scenarios is identical.
  • Helm successfully validates the JWT and completes the enrollment process.
  • The deployment manifests are configured to include the JWT, but the validation fails with the error INVALID_ENROLLMENT_TOKEN.

Steps to Reproduce

  1. Deploy the Edge Router using Helm as shown above — works successfully.
  2. Generate manifests from Helm and deploy the Edge Router using the same JWT — results in the error above.

Expected Behavior

The JWT should validate successfully, and the Edge Router should be registered regardless of whether Helm or Kubernetes manifests are used.

Hi @unixjon, welcome to the community and to OpenZiti (and zrok/BrowZer)!

I don't have much seat time using our kubernetes stuff but one thing you wrote catches my eye:

The JWT used in both scenarios is identical.

If this is the case, it would make all the sense that the second one you tried would fail. the JWTs are "one time use". Once you use it, it's no longer usable again and you'll get an error like you see.

The flow should be:

  • create a router with the openziti controller, saving the enrollment token (the JWT) somewhere safe
  • enroll the router with the enrollment token (JWT)
  • start the router

When you said "the token is identical" -- did you mean it literally perhaps?

Thank you for the warm welcome!

To clarify, when I mentioned that the JWTs are the same, I mean that I am using the exact same JWT in the environment where it is failing. Interestingly, if I use the same JWT via Helm after the failure with the manifest-based deployment method, everything works fine.

To provide more context, the Edge Routers are initially created via the NetFoundry console. However, I am deploying the Edge Router using OpenZiti because I need to use it within Kubernetes.

I'm confident there's a way to use the router chart with your NF network, and it looks like you found the correct ctrl endpoint and correctly understand how to use the one time token.

To deploy the router without Helm may require you to modify the manifest. You may wish to compare your manifest with the renderings produced by helm template.

As written, the chart expects a plaintext value in env var ZITI_ENROLL_TOKEN that is the content, not the path, of the router's one time token. The chart obtains this string from chart input .Values.enrollmentJwt.

Note: the examples demonstrate obtaining the content with helm --set-file which reads a file and sets the value to its content.

Thank you for your response.

I want to clarify that the manifests I am using (and where the deployment fails) were actually generated using helm template. Here's an example of the spec section of the Deployment I'm applying:

spec:
  volumes:
    - name: config-data
      persistentVolumeClaim:
        claimName: staging-ztna-er-pvc-uxismy
    - name: ziti-router-config
      configMap:
        name: staging-ztna-er-config-uxismy
        defaultMode: 292
    - name: kube-api-access-x2fvd
      projected:
        sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              name: kube-root-ca.crt
              items:
                - key: ca.crt
                  path: ca.crt
          - downwardAPI:
              items:
                - path: namespace
                  fieldRef:
                    apiVersion: v1
                    fieldPath: metadata.namespace
        defaultMode: 420
  containers:
    - name: ziti-router
      image: docker.io/openziti/ziti-router:1.2.2
      command:
        - /entrypoint.bash
      args:
        - run
        - /etc/ziti/config/ziti-router.yaml
      env:
        - name: DEBUG
          value: '1'
        - name: ZITI_ENROLL_TOKEN
          value: >-
            eyJhbGciOiJSUzI1NiIsImtpZCI6IjBlYjg4MzA0ODcxYTk5YTg0ZTc0MjY5NjZjYzZmODMyNjJjMThjMzkiLCJ0eXAiOiJKV1QifQ...
        - name: ZITI_BOOTSTRAP
          value: 'true'
        - name: ZITI_BOOTSTRAP_ENROLLMENT
          value: 'true'
        - name: ZITI_BOOTSTRAP_CONFIG
          value: 'false'
        - name: ZITI_AUTO_RENEW_CERTS
          value: 'true'
        - name: ZITI_HOME
          value: /etc/ziti/config
        - name: ZITI_ROUTER_NAME
          value: staging-ztna-er-uxismy
      ...

Despite following the suggested ZITI_ENROLL_TOKEN setup by passing the token content (not the path) as an environment variable, the enrollment process still fails with the same error:

{
  "error": {
    "code": "INVALID_ENROLLMENT_TOKEN",
    "message": "The supplied token is not valid",
    "requestId": "ohHMd5PQX"
  },
  "meta": {
    "apiEnrollmentVersion": "0.0.1",
    "apiVersion": "0.0.1"
  }
}

Since the same token works fine when used directly with Helm, it seems there might be a subtle difference in how the token is being processed when applying the manifest. Do you have any suggestions on additional areas to investigate or any known issues with this approach?

Thanks in advance for your support!

I referenced the router deployment guide for Kubernetes and deployed the Ziti router by deviating from the guide to run helm template and kubectl apply, so I believe the invalid token error you encountered is legitimate.

Here's how I deployed the router. In this example, *.192.168.49.2.sslip.io is a DNS wildcard record resolving to the external load balancer address, 192.168.49.2, of my test cluster.

ziti edge create edge-router "router1" \
  --tunneler-enabled \
  --jwt-output-file ./router1.jwt

helm template \
  "ziti-router1" \
  openziti/ziti-router \
    --set-file enrollmentJwt=./router1.jwt \
    --set ctrl.endpoint=miniziti-controller.192.168.49.2.sslip.io:443 \
    --set edge.advertisedHost=router1.192.168.49.2.sslip.io \
| tee ./ziti-router1.yml

kubectl apply -f ./ziti-router1.yml

Let's confirm your router's enrollment token is a valid JWT, is an "edge router one-time token" (erott) and is not expired.

You can paste the token in jwt.io for inspection. Here's mine:

{
    "header": {
        "alg": "RS256",
        "kid": "6523a8808fed727dbcfb43e6d6b1ca47f9ff9fae",
        "typ": "JWT"
    },
    "payload": {
        "iss": "https://miniziti-controller.192.168.49.2.sslip.io:443",
        "sub": "D6V8UnrJ9",
        "aud": [
            ""
        ],
        "exp": 1733771599,
        "jti": "4d90d498-ae62-46d9-af97-f1963aeb562f",
        "em": "erott",
        "ctrls": null
    },
    "analysis": {
        "signature_valid": true,
        "enrollment_method": "one-time token for a router",
        "expiration": "valid until 2024-12-09T14:13:19 (2 hours)"
    }
}

I used this Py script to check the token signature.

EDIT: Here's the entire manifest, ziti-router1.yml.

---
# Source: ziti-router/templates/configmap.yaml
# Chart name:ziti-router
apiVersion: v1
kind: ConfigMap
metadata:
  name: ziti-router1-config
  labels:
    helm.sh/chart: ziti-router-1.1.3
    app.kubernetes.io/name: ziti-router
    app.kubernetes.io/instance: ziti-router1
    app.kubernetes.io/version: "1.1.15"
    app.kubernetes.io/managed-by: Helm
data:
  ziti-router.yaml: |2-
    v: 3

    identity:
      # expected filename defined in SetZitiRouterIdentityCert()
      cert:        /etc/ziti/config/ziti-router1.cert
      # expected filename defined in SetZitiRouterIdentityServerCert()
      server_cert: /etc/ziti/config/ziti-router1.server.chain.cert
      # expected filename defined in SetZitiRouterIdentityKey()
      key:         /etc/ziti/config/ziti-router1.key
      # expected filename defined in SetZitiRouterIdentityCA()
      ca:          /etc/ziti/config/ziti-router1.cas

    ha:
      enabled: false

    ctrl:
      # router control plane API (:6262)
      endpoint:    tls:miniziti-controller.192.168.49.2.sslip.io:443
    link:
      dialers:
        - binding: transport
      # When 'transport' is disabled this means we are a 'private' router, i.e.,
      # not providing incoming links to other routers. Private routers still
      # join the mesh, but only form outgoing links.
      listeners:
        - binding:          transport
          bind:             tls:0.0.0.0:3022
          advertise:        tls:router1.192.168.49.2.sslip.io:443
          options: {}
    listeners:
    # bindings of edge and tunnel requires an "edge" section below
      - binding: edge
        address: tls:0.0.0.0:3022
        options:
          advertise: router1.192.168.49.2.sslip.io:443

    edge:
      csr:
        sans:
          dns:
            - localhost
            - router1.192.168.49.2.sslip.io
            - router1.192.168.49.2.sslip.io  # end if .Values.csr.sans.noDefaults
          ip:
            - 127.0.0.1 # end if .Values.csr.sans.noDefaults
          email:
          uri:
    forwarder:
        latencyProbeInterval: 10
        linkDialQueueLength: 1000
        linkDialWorkerCount: 32
        rateLimitedQueueLength: 5000
        rateLimitedWorkerCount: 64
        xgressDialQueueLength: 1000
        xgressDialWorkerCount: 128
---
# Source: ziti-router/templates/pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ziti-router1
  namespace: "miniziti"
  labels:
    helm.sh/chart: ziti-router-1.1.3
    app.kubernetes.io/name: ziti-router
    app.kubernetes.io/instance: ziti-router1
    app.kubernetes.io/version: "1.1.15"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: "ziti-router"
spec:
  accessModes:
    - "ReadWriteOnce"
  resources:
    requests:
      storage: "50Mi"
---
# Source: ziti-router/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: ziti-router1-edge
  labels:
    helm.sh/chart: ziti-router-1.1.3
    app.kubernetes.io/name: ziti-router
    app.kubernetes.io/instance: ziti-router1
    app.kubernetes.io/version: "1.1.15"
    app.kubernetes.io/managed-by: Helm
spec:
  type: ClusterIP
  ports:
    - port: 443
      targetPort: 3022
      protocol: TCP
      name: edge
  selector:
    app.kubernetes.io/name: ziti-router
    app.kubernetes.io/instance: ziti-router1
    app.kubernetes.io/component: "ziti-router"
---
# Source: ziti-router/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: ziti-router1-transport
  labels:
    helm.sh/chart: ziti-router-1.1.3
    app.kubernetes.io/name: ziti-router
    app.kubernetes.io/instance: ziti-router1
    app.kubernetes.io/version: "1.1.15"
    app.kubernetes.io/managed-by: Helm
spec:
  type: ClusterIP
  ports:
    - port: 443
      targetPort: 3022
      protocol: TCP
      name: transport
  selector:
    app.kubernetes.io/name: ziti-router
    app.kubernetes.io/instance: ziti-router1
    app.kubernetes.io/component: "ziti-router"
---
# Source: ziti-router/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ziti-router1
  labels:
    helm.sh/chart: ziti-router-1.1.3
    app.kubernetes.io/name: ziti-router
    app.kubernetes.io/instance: ziti-router1
    app.kubernetes.io/version: "1.1.15"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: "ziti-router"
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app.kubernetes.io/name: ziti-router
      app.kubernetes.io/instance: ziti-router1
      app.kubernetes.io/component: "ziti-router"
  template:
    metadata:
      annotations:
        configmap-checksum: f1487bb4d9b3630e5adbf3d78cef00658fede863696dd87962808820ce6c4fca
      labels:
        app.kubernetes.io/name: ziti-router
        app.kubernetes.io/instance: ziti-router1
        app.kubernetes.io/component: "ziti-router"
    spec:
      securityContext:
        fsGroup: 2171
      hostNetwork: false
      dnsPolicy: ClusterFirstWithHostNet
      dnsConfig: {}
      containers:
        - name: ziti-router
          securityContext: null
          image: docker.io/openziti/ziti-router:1.1.15
          imagePullPolicy: Always
          ports: null
          resources: {}
          command:
            - "/entrypoint.bash"
          args:
            - run
            - '/etc/ziti/config/ziti-router.yaml'
          env:
            - name: ZITI_ENROLL_TOKEN
              value: "eyJhbGciOiJSUzI1NiIsImtpZCI6IjY1MjNhODgwOGZlZDcyN2RiY2ZiNDNlNmQ2YjFjYTQ3ZjlmZjlmYWUiLCJ0eXAiOiJKV1QifQ.eyJpc3MiOiJodHRwczovL21pbml6aXRpLWNvbnRyb2xsZXIuMTkyLjE2OC40OS4yLnNzbGlwLmlvOjQ0MyIsInN1YiI6IkQ2VjhVbnJKOSIsImF1ZCI6WyIiXSwiZXhwIjoxNzMzNzcxNTk5LCJqdGkiOiI0ZDkwZDQ5OC1hZTYyLTQ2ZDktYWY5Ny1mMTk2M2FlYjU2MmYiLCJlbSI6ImVyb3R0IiwiY3RybHMiOm51bGx9.yrpTf3yp9wLTUloaqw2h-ioKRc79pevtxqVh0jViwjDMyKriFtrGRJGfo88Zwj8kQ6HItgqtoByWhw_LLy5uNo5VV81HjXaqePeIOQ3Ls0E2ju0Ry6aIHzIJWk-BDn9ni0_H2EYmInSZE5hK28Kfg8RQ3MyKeO8OPJGmfGu87khFWvGcgetOtUKSbK52AbGeYLXciiE4oLTpv9gkVZAtdnPHkYvpZBITWeNh5RF2cs7TgQWZECi7JmVrNNnOeV2hJhRTOvcYN9yweKi6n9GjE2h9Y2fXXorVrBArW8dxYgScCWYuwnIDXUpIjAj5CXtlphBHw0hHB6UBzozVxnhXGgIh_mlAyqfwInNhilc6fC1x6FZMH1sjdQnyX4v8onY6RAsAHwwa54-qmfeHcp_IHVWm7c_ldwD3b_ZycNPX_GH5lXtQ0C2DDifVByKO_Jg572RrqYNehrmmyGA9kOdXiXihe7roNfd_IRtbtxYRXGpxmwbgrdvCgYSd-s0vLR_D1RvWt7FwGqq48S0Kuwo9KqU2ebVlfMdpDewe3arxmQHED6TVGdd4w4ABzarGuK5hffQezynaR5BpEU0SMt5YwLyB-H8X3LMXXEts6ciSfp11QM-ecRFNDwwOHKCuwUKNl3F-ge0mmv1TJgC0tICKIA83YMCNR2EXd1lOMSg2cvA"
            # must be true or enroll() will not be called
            - name: ZITI_BOOTSTRAP
              value: "true"
            # -- enroll with controller if "true," overwrite if "force"; requires ZITI_BOOTSTRAP=true
            - name: ZITI_BOOTSTRAP_ENROLLMENT
              value: "true"
            # suppress generating a config.yml because in K8s we mount a configMap rendered by Helm
            - name: ZITI_BOOTSTRAP_CONFIG
              value: "false"
            # entrypoint will append --extend to run command if "true"
            - name: ZITI_AUTO_RENEW_CERTS
              value: "true"
            # used by entrypoint's enroll() function to predict the path to the enrolled identity's cert
            - name: ZITI_HOME
              value: "/etc/ziti/config"
            - name: ZITI_ROUTER_NAME
              value: "ziti-router1"
          volumeMounts:
            - mountPath: /etc/ziti/config
              name: config-data
              readOnly: false
            - mountPath: /etc/ziti/config/ziti-router.yaml
              name: ziti-router-config
              subPath: ziti-router.yaml # project the read-only config into the writeable volume to allow router to write ./endpoints state file in same dir as config
          # deployment condition ready and receive traffic when this passes
          readinessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - ziti agent stats
            initialDelaySeconds: 10
            periodSeconds: 10
          # delete pod if this fails
          livenessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - ziti agent stats
            initialDelaySeconds: 10
            periodSeconds: 10
      volumes:
        - name: ziti-router-config
          configMap:
            name: ziti-router1-config
            defaultMode: 0444
        - name: config-data
          persistentVolumeClaim:
            claimName: ziti-router1
---

---
# Source: ziti-router/templates/pre-upgrade-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: ziti-router1-hook-serviceaccount
  namespace: miniziti
  annotations:
    "helm.sh/hook": pre-upgrade
    "helm.sh/hook-weight": "-8"
    "helm.sh/hook-delete-policy": before-hook-creation, hook-succeeded
---
# Source: ziti-router/templates/pre-upgrade-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: ziti-router1-pre-upgrade-hook
  labels:
    helm.sh/chart: ziti-router-1.1.3
    app.kubernetes.io/name: ziti-router
    app.kubernetes.io/instance: ziti-router1
    app.kubernetes.io/version: "1.1.15"
    app.kubernetes.io/managed-by: Helm
  annotations:
    "helm.sh/hook": pre-upgrade
    "helm.sh/hook-weight": "-8"
    "helm.sh/hook-delete-policy": before-hook-creation, hook-succeeded
data:
  migrate-identity.bash: |-
    #!/usr/bin/env bash
    set -o errexit
    set -o nounset
    set -o pipefail
    set -o xtrace

    # - copy the private key from the hook-managed secret to the persistent volume
    # - rename router identity files to match the ziti config generator's conventions
    # - remove the hook-managed secret containing the private key, signaling the migration is complete

    trap 'echo "ERROR: ${BASH_SOURCE[0]}:${LINENO} exited with code $?" >&2;' ERR

    function noClobber() {
      local src=$1
      local dst=$2
      if [[ -s "${src}" ]]
      then
        if [[ -s "${dst}" ]]
        then
          echo "ERROR: ${dst} already exists, refusing to overwrite"
          return 1
        else
          echo "INFO: renaming ${src}"
          mv "${src}" "${dst}"
        fi
      else
        echo "INFO: ${src} is empty or does not exist, skipping"
      fi
    }

    if kubectl -n miniziti get secret \
      ziti-router1-identity &>/dev/null
    then
      # prior versions of the chart stored certs in a Secret resource, so this copies those certs to the persistent
      # volume unless a file already exists in the persistent volume
      typeset -a KEYS=(
        $(
          kubectl -n miniziti get secret \
            ziti-router1-identity \
              --output go-template='{{range $k,$v := .data}}{{if $v}}{{printf "%s " $k}}{{end}}{{end}}'
        )
      )
      echo "DEBUG: found identity secret dict keys: ${KEYS[*]}"
      for KEY in ${KEYS[@]}; do
        if [[ ${KEY} =~ ^tls\.key$ ]]
        then
          kubectl -n miniziti get secret ziti-router1-identity \
          --output go-template='{{index .data "'${KEY}'" | base64decode }}' \
          > "/etc/ziti/config/ziti-router1.key"
        fi
      done

      declare -A ID_FILES=(
        [client.crt]=ziti-router1.cert
        [tls.crt]=ziti-router1.server.chain.cert
        [ca.crt]=ziti-router1.cas
      )

      for KEY in ${!ID_FILES[@]}; do
        noClobber "/etc/ziti/config/${KEY}" "/etc/ziti/config/${ID_FILES[${KEY}]}"
      done

      kubectl -n miniziti delete secret \
        ziti-router1-identity
    else
      echo "INFO: identity secret does not exist"
    fi
---
# Source: ziti-router/templates/pre-upgrade-serviceaccount.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: ziti-router1-hook-role
  annotations:
    "helm.sh/hook": pre-upgrade
    "helm.sh/hook-weight": "-7"
    "helm.sh/hook-delete-policy": before-hook-creation, hook-succeeded
rules:
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get", "delete"]
---
# Source: ziti-router/templates/pre-upgrade-serviceaccount.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: ziti-router1-hook-rolebinding
  annotations:
    "helm.sh/hook": pre-upgrade
    "helm.sh/hook-weight": "-6"
    "helm.sh/hook-delete-policy": before-hook-creation, hook-succeeded
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: ziti-router1-hook-role
subjects:
  - kind: ServiceAccount
    name: ziti-router1-hook-serviceaccount
    namespace: miniziti
---
# Source: ziti-router/templates/pre-upgrade-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: ziti-router1-pre-upgrade-job
  labels:
    app.kubernetes.io/managed-by: "Helm"
    app.kubernetes.io/instance: "ziti-router1"
    app.kubernetes.io/version: 1.1.15
    helm.sh/chart: "ziti-router-1.1.3"
  annotations:
    # This is what defines this resource as a hook. Without this line, the
    # job is considered part of the release.
    "helm.sh/hook": pre-upgrade
    "helm.sh/hook-weight": "-5"
    "helm.sh/hook-delete-policy": before-hook-creation
spec:
  backoffLimit: 0
  completions: 1
  ttlSecondsAfterFinished: 600
  template:
    metadata:
      name: ziti-router1
      labels:
        app.kubernetes.io/managed-by: "Helm"
        app.kubernetes.io/instance: "ziti-router1"
        helm.sh/chart: "ziti-router-1.1.3"
        helm.sh/chart: ziti-router-1.1.3
        app.kubernetes.io/name: ziti-router
        app.kubernetes.io/instance: ziti-router1
        app.kubernetes.io/version: "1.1.15"
        app.kubernetes.io/managed-by: Helm
    spec:
      restartPolicy: Never
      serviceAccountName: ziti-router1-hook-serviceaccount
      containers:
        - name: pre-upgrade-job
          image: docker.io/openziti/ziti-router:1.1.15
          imagePullPolicy: Always
          volumeMounts:
            - mountPath: /usr/local/bin/migrate-identity.bash
              name: migrate-script
              subPath: migrate-identity.bash
            - mountPath: /etc/ziti/config
              name: config-data
              readOnly: false
          command: ["migrate-identity.bash"]
          # command: ["sh", "-c", "while true; do sleep 86400; done"]
      volumes:
        - name: migrate-script
          configMap:
            name: ziti-router1-pre-upgrade-hook
            items:
              - key: migrate-identity.bash
                path: migrate-identity.bash
                mode: 0555
        - name: config-data
          persistentVolumeClaim:
            claimName: ziti-router1

I finally managed to solve the original issue. It wasn't a problem with the JWT after all—it was an issue with the ctrl.endpoint. I had forgotten to prepend tls: to the endpoint! :slight_smile:

After successfully deploying the Edge Router, I encountered another small issue: my cluster operates in the 100.64.0.0 range, which conflicts with the default IPs used by the tunnel. To resolve this, I configured tunnel.dnsSvcIpRange to avoid the conflict.

However, now I’m facing a new issue. From within the Edge Router, I can successfully telnet to the target Kong instance, but when testing from my local machine using Ziti Desktop, the requests don’t seem to reach the Edge Router.

Here is my current ziti-router.yaml configuration:

v: 3
identity:
  cert: /etc/ziti/config/staging-ztna-er.cert
  server_cert: /etc/ziti/config/staging-ztna-er.server.chain.cert
  key: /etc/ziti/config/staging-ztna-er.key
  ca: /etc/ziti/config/staging-ztna-er.cas
ctrl:
  endpoint: tls:xxxxx.xxxxx.xxxxx.xxxx.xxxx:443
link:
  dialers:
    - binding: transport
listeners:
  - binding: edge
    address: tls:0.0.0.0:3022
    options:
      advertise: staging-ztna-er-edge.ztna-configconnector.svc:443
      connectTimeoutMs: 1000
      getSessionTimeout: 60
  - binding: tunnel
    options:
      dnsSvcIpRange: 100.80.0.0/12
      mode: tproxy
      resolver: udp://127.0.0.1:53
      lanIf: lo
edge:
  csr:
    sans:
      dns:
        - localhost
        - staging-ztna-er-edge.ztna-configconnector.svc.cluster.local
      ip:
        - 127.0.0.1
forwarder:
  latencyProbeInterval: 10
  linkDialQueueLength: 1000
  linkDialWorkerCount: 32
  rateLimitedQueueLength: 5000
  rateLimitedWorkerCount: 64
  xgressDialQueueLength: 1000
  xgressDialWorkerCount: 128

Current Observations

  • From the Edge Router, I can successfully reach the target Kong instance using telnet.
  • From my local machine using Ziti Desktop, requests don’t seem to arrive at the Edge Router.

Questions

  • Is the current tunnel mode (tproxy) correct for this use case, or should I use host or proxy mode instead?
  • Is there anything in my ziti-router.yaml configuration that could prevent Ziti Desktop requests from being processed by the Edge Router?

Oh, that's usually pre-pended on your behalf by the Helm template (link to line) when you supply an address like ctrl.endpoint={fqdn}:{port}. I'm guessing the problem arose from the way you're rendering the templates for additional processing before declaring the manifest in Kube API.

Good catch.

The router's tunnel binding mode depends on the network relationship between the router and the target server.

  • host is a reverse proxy - this is the most common choice because routers are often placed in remote private networks where they can reach a target server you wish to publish as a Ziti service
  • tproxy provides a nameserver and transparent proxy for clients that will reach the target server via a Ziti service - this requires elevated privileges to listen on 53/udp and manipulate iptables rules
  • proxy provides a raw L4 proxy that is 1-to-1 TCP port to Ziti service
  • none disables the tunnel binding

Since the router can reach Kong without Ziti, I assume your goal is for the router to function as a reverse proxy, so the correct tunnel binding mode is host (i.e., to host a Ziti service).

After configuring the router's tunnel binding for host mode, ensure that router's tunnel identity has permission to host the service in a "Bind Service Policy."

For example, if the router's name is "router1" and the service's name is "service1" you would query the policy advisor like this.

ziti edge policy-advisor services --quiet service1 router1

The good result looks like this, notice "Bind: Y."

OKAY : router1 (1) -> service1 (2) Common Routers: (1/1) Dial: N Bind: Y 

With bind permission the router's tunnel identity will create a terminator for the service on its parent router which you can then verify. If you attempt to dial a Ziti service with Ziti Desktop before the terminator is created it will log "ERROR: ... no terminators"

ziti edge list terminators

In my current setup, I have a GKE cluster hosting my microservices, and Kong is deployed as the ingress to handle API requests. Additionally, I have installed an edge router within this GKE cluster.

Now, the goal is to allow external requests—originating either from my personal computer or another server outside the GKE cluster (or even outside the Google Cloud account)—to be intercepted by another edge router located in the external environment. This external edge router should then route the requests to the edge router inside the GKE cluster, which will finally forward the traffic to Kong.

Given this architecture, my understanding is that I need host mode for the internal edge router to act as the entry point in GKE,
Is this understanding correct, and would you confirm that host mode is the appropriate setup for this use case?

Correct. The "internal" router placed near the target server will provide a hosting tunneler, and the router tunnel binding mode for that function is "host." A Ziti router is an excellent choice for hosting tunneled Ziti services in a cloud environment.

Here's a chart of Kubernetes tunneling alternatives.

I believe my main issue lies in the fact that my architecture is a mix. I’m using NetFoundry (which is based on Ziti :slight_smile: ) as my Fabric and manager for edge routers and services. Whenever an Edge Router is created outside of Kubernetes, everything works perfectly.

The problem arose when we attempted to make our ZTNA architecture Kubernetes-native by deploying ziti-router and trying to get everything working. The ziti-router I have deployed meets all the requirements except for the part where it forwards traffic to the target destination.

In my latest attempt, I created:

  1. An Edge Router deployed using the NetFoundry image outside the cluster.
  2. Two Edge Routers (ziti-routers) deployed inside the cluster:
  • One in host mode.
  • One in tproxy mode.

Finally, I tested from my Ziti Desktop client to reach the Kong instance within the cluster. Only the Edge Router outside the cluster (configured in tproxy mode) successfully forwarded the traffic.

The only difference I noticed between the tproxy mode Edge Router inside Kubernetes and the one outside (running on a VM) is that the one outside has multiple iptables entries of type NF-Intercept and others. Meanwhile, the one running inside the container has iptables installed but has no entries at all.

The ziti-router deployment with mode host is well suited to provide the server-facing reverse proxy authorized by a "Bind" service policy.

I'm unsure of the role of the ziti-router deployment with mode tproxy. Do you wish to enable a K8s workload/pod to initiate connections to a Ziti service?

This tunneling tproxy mode requires elevated privileges and additional considerations, like DNS resolver configuration, to enable it to function as a client-facing transparent proxy and nameserver.

The router Helm chart is not tailored for tproxy mode (e.g., no inputs for running as root), and it seems likely that the VM image is tailored for that mode. In case that is your goal, the VM may also have OS-level configuration to allow the router to provide infrastructure IP routing capability for Ziti services.

Still, you could adapt the rendering of the router chart manifest as a TPROXY sidecar if you're sure you want the router and tunneling capabilities in a pod's namespace. Here is an example using a bespoke container image that provides only the tunneling capability, not the router capability, as a pod sidecar. The same approach should work with the openziti/ziti-router container image as a sidecar.

The Kubernetes tunneling flow chart I linked describes the alternatives that are known to work. The chart doesn't not have any outcomes that involve running ziti-router inside a cluster in tproxy mode because there are other ways to achieve the same capabilities. For example, if you wish to authorize all pods on all worker nodes to initiate connections to a set of Ziti services you can deploy the node proxy daemonset. If you wish to limit Ziti service access to specific pods then the sidecar proxy is best for providing Ziti DNS and a transparent intercepting proxy for the pod.

Thank you for the detailed explanation and guidance.

Regarding the ziti-router deployment in tproxy mode:
Yes, my goal is to enable a Kubernetes workload/pod to initiate connections to a Ziti service. I understand now that this tunneling mode requires elevated privileges and additional configurations (e.g., DNS resolver setup) to function as a client-facing transparent proxy and nameserver.

From your response, I see two key takeaways:

  1. The Helm chart is not tailored for tproxy mode, as it lacks inputs for running with elevated privileges (e.g., as root) or other configurations typically required by the VM-based deployment.
  2. Using ziti-router in tproxy mode inside Kubernetes may not be ideal due to better alternatives like node proxy daemonsets or sidecar proxies.

Given these points:

  • I will investigate deploying the node proxy daemonset for scenarios where all worker nodes need access to Ziti services.
  • For cases where access to Ziti services needs to be limited to specific pods, I will explore using the sidecar proxy approach.

Additional Scenario: Reverse Flow

In our architecture, we also have a scenario where the flow of traffic is reversed — that is, from external edge routers (outside the cluster) to internal edge routers (inside Kubernetes).

This is different from the standard flow for most connections, which goes from the internal edge routers in Kubernetes to the external edge routers. This standard flow works well and involves some additional components — for example, in GCP, I use Compute Routes to forward traffic leaving the GKE cluster to the external edge routers running on a VM.

However, for the reverse flow (external to internal), we are facing challenges. I’d like to know if the sidecar proxy or node proxy approach can also be adapted to support this use case effectively, or if there are other recommendations for handling this specific reverse flow scenario.

Questions

  1. If adapting the router chart manifests to support tproxy mode, what specific configurations (e.g., root privileges, iptables setup) would you recommend adding to align it with the VM's capabilities?
  2. Are there any significant limitations or trade-offs with the sidecar proxy approach compared to using tproxy mode with ziti-router?
  3. For the node proxy daemonset, is there a recommended starting point or example configuration tailored for Ziti?
  4. How can we best address the reverse flow scenario (external to internal edge routers) using Ziti’s architecture?

I greatly appreciate your guidance so far and look forward to exploring these alternatives.

1 Like

This use case can be described as "hosting a tunneled Ziti service in Kubernetes." That is, the target server is likely a ClusterIP service or cluster-internal Gateway, and your Ziti tunneling strategy is to place a pod running a Ziti tunneler somewhere where it can reach the target server to provide the server-facing reverse proxy.

In that tunneling strategy, I'd sort the tactics like so:

  1. Deploy ziti-router chart with tunneling mode host.
  2. Deploy the ziti-host chart (the tunneler only).

The router is slightly preferred for the server-facing reverse proxy case because it makes your Ziti services more robust. Routers form multiple transport links with other routers to create a resilient mesh, whereas the dedicated tunneler will connect with a single router at a time.

I'm not sufficiently familiar with the VM's capabilities to compare them, but the privileges and DNS config indicated in the sidecar example should work with the ziti-router container image.

That much will satisfy the prerequisites for tproxy mode, but you'll need to provide additional configuration to the router, e.g., to configure the control plane endpoint.

The router chart does this by parsing input values to a mounted ConfigMap containing the router's configuration YAML and env vars (link to template). I'm a little dubious on this path. If you're determined, then I'd recommend rendering the manifest with Helm and porting it to your deployment manifest.

The tunneling-only sidecar approach simplifies configuration and uses fewer resources, which is good for a sidecar model.

The sidecar (t)proxy example functions similarly to the Istio sidecar model, but as a "split-tunnel" intercepting only those destinations that match an authorized Ziti service, and it functions at layer 4 TCP/UDP, not HTTP/gRPC.

The tunneling-only sidecar example uses container image openziti/ziti-tunnel which runs ziti tunnel instead of ziti router. This sub-command invokes only the tunneling capabilities of the ziti binary, not the router capabilities.

The node proxy daemonset outcome of the flow chart links to this section of the same page: Tunneling Kubernetes Workloads | OpenZiti

That section links to the README/guide for the node proxy daemonset: Kubernetes Node Daemonset | OpenZiti

As noted in the prior post, I recommend deploying the ziti-router chart with tunnel mode host.