Please run list your links. I suspect your routers are not linking together. Check the advertised addresses, check the routers are set for link listeners, verify the link listener address is available from the router that should dial it.
$ ziti fabric list links
╭────┬────────┬──────────┬─────────────┬─────────────┬─────────────┬───────┬────────┬───────────╮
│ ID │ DIALER │ ACCEPTOR │ STATIC COST │ SRC LATENCY │ DST LATENCY │ STATE │ STATUS │ FULL COST │
├────┼────────┼──────────┼─────────────┼─────────────┼─────────────┼───────┼────────┼───────────┤
╰────┴────────┴──────────┴─────────────┴─────────────┴─────────────┴───────┴────────┴───────────╯
results: none
Is there any command or something to set the links or it should happen automatically na? BTW below are the values.yml used for router-private & router-public for your reference.
router-private.yml
ctrl:
endpoint: ziti-controller.example.com:443
advertisedHost: ziti-router-private.example.com
# Edge configuration for external identities
edge:
advertisedHost: ziti-router-private.example.com
advertisedPort: 443
service:
type: LoadBalancer
annotations:
external-dns.alpha.kubernetes.io/hostname: ziti-router-private.example.com
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-security-groups: "sg-0e9e6f9fce67feba1"
service.beta.kubernetes.io/aws-load-balancer-manage-backend-security-group-rules: "true"
ingress:
enabled: false
# Link listeners for router-to-router communication (internal)
linkListeners:
transport:
advertisedHost: ziti-router-transport-private.ziti-router.svc.cluster.local
advertisedPort: 443
service:
enabled: true
type: ClusterIP # All routers are internal; no external exposure
ingress:
enabled: false # Not needed as routers are internal
# Persistence for router data
persistence:
enabled: true
accessMode: ReadWriteOnce
size: 1Gi
storageClass: ebs-sc
router-public.yml
ctrl:
endpoint: ziti-controller.example.com:443
advertisedHost: ziti-router-public.example.com
# Edge configuration for external identities
edge:
advertisedHost: ziti-router-public.example.com
advertisedPort: 443
service:
type: LoadBalancer
annotations:
external-dns.alpha.kubernetes.io/hostname: ziti-router-public.example.com
service.beta.kubernetes.io/aws-load-balancer-internal: "false"
service.beta.kubernetes.io/aws-load-balancer-security-groups: "sg-0e9e6f9fce67feba1"
service.beta.kubernetes.io/aws-load-balancer-manage-backend-security-group-rules: "true"
ingress:
enabled: false
# Link listeners for router-to-router communication (internal)
linkListeners:
transport:
advertisedHost: ziti-router-transport-public.ziti-router.svc.cluster.local
advertisedPort: 443
service:
enabled: true
type: ClusterIP # All routers are internal; no external exposure
ingress:
enabled: false # Not needed as routers are internal
# Persistence for router data
persistence:
enabled: true
accessMode: ReadWriteOnce
size: 1Gi
storageClass: ebs-sc
@TheLumberjack I checked values.yml of both the routers when I posted in the last message and found that I had entered the wrong advertisedHost for linkListeners. So I changed them to:
When you see handshake failed messages, this generally indicates the PKI for some part of the overlay is incorrect. What are these IPs?
10.0.1.248
10.0.2.252
10.0.2.33
10.0.3.162
10.0.3.85
You will see logs like this when a connection is attempted but the certificate presented didn't match. My guess is these are old routers trying to connect to the new routers. It could also be ziti-edge-tunnel instances which are no longer valid. The errors shouldn't be preventing your testing.
At this point, I expect you to not get the "can't route from" error. Are things working now, other than the errors?
I got the following output when running the ziti fabric list links command:
$ ziti fabric list links
╭────────────────────────┬───────────────┬────────────────┬─────────────┬─────────────┬─────────────┬───────────┬────────┬───────────╮
│ ID │ DIALER │ ACCEPTOR │ STATIC COST │ SRC LATENCY │ DST LATENCY │ STATE │ STATUS │ FULL COST │
├────────────────────────┼───────────────┼────────────────┼─────────────┼─────────────┼─────────────┼───────────┼────────┼───────────┤
│ 6sRZrjtjhtkaQF8VtkIgdl │ router-public │ router-private │ 1 │ 3.7ms │ 3.8ms │ Connected │ up │ 7 │
╰────────────────────────┴───────────────┴────────────────┴─────────────┴─────────────┴─────────────┴───────────┴────────┴───────────╯
After this change, the service became reachable, and the curl command worked.
Regarding the logs, I plan to create a fresh EKS cluster and set everything up from scratch. This will help me determine whether the error logs about the connection attempts are due to old identities, routers, or some other issue. If I can't resolve it, I'll create a new post for that topic, as it seems to be a separate issue.