Troubleshooting OpenZiti Tunnel: Edge Router and Service Configuration Issues

am3y · January 7, 2025, 4:18am

Hii @TheLumberjack For now I have decided to ignore router-services & removed it..

These are the current routers

$ ziti edge list edge-routers
╭───────────┬────────────────┬────────┬───────────────┬──────┬────────────────╮
│ ID        │ NAME           │ ONLINE │ ALLOW TRANSIT │ COST │ ATTRIBUTES     │
├───────────┼────────────────┼────────┼───────────────┼──────┼────────────────┤
│ SolPOIazd │ router-private │ true   │ true          │    0 │ router-private │
│ zFVPOI2ib │ router-public  │ true   │ true          │    0 │ router-public  │
╰───────────┴────────────────┴────────┴───────────────┴──────┴────────────────╯
results: 1-2 of 2

I have set these policies:

ziti edge create edge-router-policy router-private-router-policy \
--edge-router-roles "#router-private" \
--identity-roles "#EC2-private" \
--semantic "AllOf"

ziti edge create edge-router-policy router-public-router-policy \
--edge-router-roles "#router-public" \
--identity-roles "#EC2-public" \
--semantic "AllOf"

ziti edge create service-edge-router-policy all-services-on-all-routers \
      --edge-router-roles '#all' \
      --service-roles '#all'

No Errors are seen in the screen session of EC2-public but in the Screen session of EC2-private I get this error

(17515)[        9.823]   ERROR ziti-sdk:connect.c:1071 connect_reply_cb() conn[0.0/Wy7DGYVj/Connecting](apache-service) failed to connect, reason=can't route from SolPOIazd -> zFVPOI2ib

TheLumberjack · January 7, 2025, 4:46am

Please run list your links. I suspect your routers are not linking together. Check the advertised addresses, check the routers are set for link listeners, verify the link listener address is available from the router that should dial it.

ziti fabric list links

am3y · January 7, 2025, 5:05am

Ohh this is empty

$ ziti fabric list links
╭────┬────────┬──────────┬─────────────┬─────────────┬─────────────┬───────┬────────┬───────────╮
│ ID │ DIALER │ ACCEPTOR │ STATIC COST │ SRC LATENCY │ DST LATENCY │ STATE │ STATUS │ FULL COST │
├────┼────────┼──────────┼─────────────┼─────────────┼─────────────┼───────┼────────┼───────────┤
╰────┴────────┴──────────┴─────────────┴─────────────┴─────────────┴───────┴────────┴───────────╯
results: none

Is there any command or something to set the links or it should happen automatically na? BTW below are the values.yml used for router-private & router-public for your reference.

router-private.yml

ctrl:
  endpoint: ziti-controller.example.com:443
  advertisedHost: ziti-router-private.example.com

# Edge configuration for external identities
edge:
  advertisedHost: ziti-router-private.example.com
  advertisedPort: 443
  service:
    type: LoadBalancer  
    annotations:
      external-dns.alpha.kubernetes.io/hostname: ziti-router-private.example.com
      service.beta.kubernetes.io/aws-load-balancer-internal: "true"
      service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
      service.beta.kubernetes.io/aws-load-balancer-security-groups: "sg-0e9e6f9fce67feba1"
      service.beta.kubernetes.io/aws-load-balancer-manage-backend-security-group-rules: "true"
  ingress:
    enabled: false

# Link listeners for router-to-router communication (internal)
linkListeners:
  transport:
    advertisedHost: ziti-router-transport-private.ziti-router.svc.cluster.local
    advertisedPort: 443
    service:
      enabled: true
      type: ClusterIP  # All routers are internal; no external exposure
    ingress:
      enabled: false  # Not needed as routers are internal

# Persistence for router data
persistence:
  enabled: true
  accessMode: ReadWriteOnce
  size: 1Gi
  storageClass: ebs-sc

router-public.yml

ctrl:
  endpoint: ziti-controller.example.com:443
  advertisedHost: ziti-router-public.example.com

# Edge configuration for external identities
edge:
  advertisedHost: ziti-router-public.example.com
  advertisedPort: 443
  service:
    type: LoadBalancer  
    annotations:
      external-dns.alpha.kubernetes.io/hostname: ziti-router-public.example.com
      service.beta.kubernetes.io/aws-load-balancer-internal: "false"
      service.beta.kubernetes.io/aws-load-balancer-security-groups: "sg-0e9e6f9fce67feba1"
      service.beta.kubernetes.io/aws-load-balancer-manage-backend-security-group-rules: "true"
  ingress:
    enabled: false


# Link listeners for router-to-router communication (internal)
linkListeners:
  transport:
    advertisedHost: ziti-router-transport-public.ziti-router.svc.cluster.local
    advertisedPort: 443
    service:
      enabled: true
      type: ClusterIP  # All routers are internal; no external exposure
    ingress:
      enabled: false  # Not needed as routers are internal

# Persistence for router data
persistence:
  enabled: true
  accessMode: ReadWriteOnce
  size: 1Gi
  storageClass: ebs-sc

am3y · January 7, 2025, 5:59am

@TheLumberjack I checked values.yml of both the routers when I posted in the last message and found that I had entered the wrong advertisedHost for linkListeners. So I changed them to:

ziti-router-transport-public.ziti-router.svc.cluster.local ==> ziti-router-public-release-transport.ziti-router.svc.cluster.local
ziti-router-transport-private.ziti-router.svc.cluster.local ==> ziti-router-private-release-transport.ziti-router.svc.cluster.local

Later I reinstalled both the routers with the updated values.yml & then I out the output for the below command.

$ ziti fabric list links
╭────────────────────────┬───────────────┬────────────────┬─────────────┬─────────────┬─────────────┬───────────┬────────┬───────────╮
│ ID                     │ DIALER        │ ACCEPTOR       │ STATIC COST │ SRC LATENCY │ DST LATENCY │ STATE     │ STATUS │ FULL COST │
├────────────────────────┼───────────────┼────────────────┼─────────────┼─────────────┼─────────────┼───────────┼────────┼───────────┤
│ 6sRZrjtjhtkaQF8VtkIgdl │ router-public │ router-private │           1 │       3.7ms │       3.8ms │ Connected │     up │         7 │
╰────────────────────────┴───────────────┴────────────────┴─────────────┴─────────────┴─────────────┴───────────┴────────┴───────────╯

BTW I also checked the pod logs of both the routers.

The log was very much big.. So I have uploaded them..
public-router-pod.txt (720.7 KB)
private-router-pod.txt (702.6 KB)

Anyway to fix those issues?

TheLumberjack · January 7, 2025, 12:50pm

When you see handshake failed messages, this generally indicates the PKI for some part of the overlay is incorrect. What are these IPs?

10.0.1.248
10.0.2.252
10.0.2.33
10.0.3.162
10.0.3.85

You will see logs like this when a connection is attempted but the certificate presented didn't match. My guess is these are old routers trying to connect to the new routers. It could also be ziti-edge-tunnel instances which are no longer valid. The errors shouldn't be preventing your testing.

At this point, I expect you to not get the "can't route from" error. Are things working now, other than the errors?

am3y · January 7, 2025, 1:10pm

Thanks for your response, @TheLumberjack.

When I changed this:

ziti-router-transport-public.ziti-router.svc.cluster.local ==> ziti-router-public-release-transport.ziti-router.svc.cluster.local
ziti-router-transport-private.ziti-router.svc.cluster.local ==> ziti-router-private-release-transport.ziti-router.svc.cluster.local

I got the following output when running the ziti fabric list links command:

$ ziti fabric list links
╭────────────────────────┬───────────────┬────────────────┬─────────────┬─────────────┬─────────────┬───────────┬────────┬───────────╮
│ ID                     │ DIALER        │ ACCEPTOR       │ STATIC COST │ SRC LATENCY │ DST LATENCY │ STATE     │ STATUS │ FULL COST │
├────────────────────────┼───────────────┼────────────────┼─────────────┼─────────────┼─────────────┼───────────┼────────┼───────────┤
│ 6sRZrjtjhtkaQF8VtkIgdl │ router-public │ router-private │           1 │       3.7ms │       3.8ms │ Connected │     up │         7 │
╰────────────────────────┴───────────────┴────────────────┴─────────────┴─────────────┴─────────────┴───────────┴────────┴───────────╯

After this change, the service became reachable, and the curl command worked.

Regarding the logs, I plan to create a fresh EKS cluster and set everything up from scratch. This will help me determine whether the error logs about the connection attempts are due to old identities, routers, or some other issue. If I can't resolve it, I'll create a new post for that topic, as it seems to be a separate issue.

TheLumberjack · January 7, 2025, 1:21pm

There we go - excellent! we did it! Thanks for letting me know

am3y · January 7, 2025, 1:24pm

@TheLumberjack
@scareything
Thanks to both of you for spending your valuable time and helping me with this case.

Topic		Replies	Views
Ziti Tunneler Configuration: Resolving Identical ziti0 IPs and Enabling Cross-EC2 Communication Ziti Overlay	4	48	December 28, 2024
Need Guidance on Setting Up Multiple OpenZiti Routers and Identity Devices in EKS	4	55	August 9, 2024
EKS Cluster Router :"failed to dial fabric Ziti Overlay	64	134	July 25, 2024
AWS Self hosted K8s/DB scenarios General Questions	6	40	November 1, 2024
How to access private eks cluster apiserver via openziti Ziti Overlay	2	55	July 7, 2024

Troubleshooting OpenZiti Tunnel: Edge Router and Service Configuration Issues

Related topics