EKS Cluster Router :"failed to dial fabric

Great! Please check this summary for accuracy.

There are two K8S clusters: K3D w/ a private router hosting a service, and EKS with a public router. The private router reliably forms a "connected" link to the public router. The private router's identity has bind permission for the service, a client identity has dial permission, the dial identity has permission to use the public router, and the service has permission to use both routers. The public router is the online, common edge router and traversal router for this service.

Everything looks like it's well configured, but when the client calls the service one of the routers emits a message like this.

{
	"_context": "ch{edge}-eu{classic}-ei{1V7J}",
	"chSeq": 2,
	"connId": 19,
	"edgeSeq": 0,
	"error": "can't route from OSkP5ZEL2O - N0FndZEL2O",
	"file": "github.com/openziti/ziti/router/xgress_edge/listener.go:199",
	"func": "github.com/openziti/ziti/router/xgress_edge.(*edgeClientConn).processConnect",
	"level": "warning",
	"msg": "failed to dial fabric",
	"time": "2024-07-19T18:14:10.588Z",
	"token": "68e0b72b-ed86-48c7-a792-633a7eefd7ea",
	"type": "EdgeConnectType"
}
  1. Is the controller running in K3D or EKS?
  2. When this error is emitted, can you look up those IDs to figure out which routers they refer to?

router logs

{"_context":"ch{edge}-\u003eu{classic}-\u003ei{nZ91}","chSeq":2,"connId":1,"edgeSeq":0,"error":"can't route from TC00FDSge -\u003e N0FndZEL2O","file":"github.com/openziti/ziti/router/xgress_edge/listener.go:199","func":"github.com/openziti/ziti/router/xgress_edge.(*edgeClientConn).processConnect","level":"warning","msg":"failed to dial fabric","time":"2024-07-23T19:39:49.210Z","token":"2bf149a2-f20d-43ce-a918-e8655f190557","type":"EdgeConnectType"}
{"_context":"tls:0.0.0.0:3022","error":"EOF","file":"github.com/openziti/transport/v2@v2.0.133/tls/listener.go:257","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"172.31.4.18:65050","time":"2024-07-23T19:39:55.968Z"}
{"_context":"tls:0.0.0.0:3022","error":"EOF","file":"github.com/openziti/transport/v2@v2.0.133/tls/listener.go:257","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"172.31.4.18:57404","time":"2024-07-23T19:39:56.932Z"}
{"_context":"tls:0.0.0.0:10080","error":"EOF","file":"github.com/openziti/transport/v2@v2.0.133/tls/listener.go:257","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"172.31.4.18:42787","time":"2024-07-23T19:39:57.188Z"}
{"_context":"tls:0.0.0.0:10080","error":"EOF","file":"github.com/openziti/transport/v2@v2.0.133/tls/listener.go:257","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"172.31.4.18:56210","time":"2024-07-23T19:39:57.403Z"}
{"_context":"tls:0.0.0.0:3022","error":"EOF","file":"github.com/openziti/transport/v2@v2.0.133/tls/listener.go:257","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"172.31.4.18:45136","time":"2024-07-23T19:40:05.968Z"}
{"_context":"tls:0.0.0.0:3022","error":"EOF","file":"github.com/openziti/transport/v2@v2.0.133/tls/listener.go:257","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"172.31.4.18:37899","time":"2024-07-23T19:40:06.932Z"}

These are the errors I see now in public router of eks

Here is how I deployed

linkListeners:
  transport:  # https://docs.openziti.io/docs/reference/configuration/router/#transport
    containerPort: 10080
    advertisedHost: example.com
    advertisedPort: 10080
    service:
      enabled: true
      type: LoadBalancer
      labels:
      annotations:
    ingress:
      enabled: true
      ingressClassName: nginx
      annotations:
        kubernetes.io/ingress.allow-http: "false"
        nginx.ingress.kubernetes.io/ssl-passthrough: "true"

# listen for edge clients
edge:
  enabled: true
  containerPort: 3022
  advertisedHost: example.com
  advertisedPort: 3022
  service:
    enabled: true
    # -- expose the service as a ClusterIP, NodePort, or LoadBalancer
    type: LoadBalancer
    # -- service labels
    labels:
    # -- service annotations
    annotations:
  ingress:
    enabled: true
    ingressClassName: nginx
    annotations:
      kubernetes.io/ingress.allow-http: "false"
      nginx.ingress.kubernetes.io/ssl-passthrough: "true"

tunnel:
  mode: host

helm upgrade  --install "ziti-router" openziti/ziti-router \
  --namespace "ziti" \
  --values=router-values.yaml \
  --set-file enrollmentJwt=./router1.jwt \
  --set ctrl.endpoint="ec2-ip.compute-1.amazonaws.com:6262"

Then I noted down the loadbalancers external host urls and updated router-values.yaml back with that and ran same command

Controller logs

"level":"error","msg":"connection handler error for [tls:10.42.0.1:48107] (unknown/unenrolled router, routerId: f8Uyt.ELeO)","time":"2024-07-23T19:49:00.346Z"}
{"file":"github.com/openziti/ziti/controller/handler_ctrl/connect.go:117","func":"github.com/openziti/ziti/controller/handler_ctrl.(*ConnectHandler).HandleConnection","level":"error","msg":"unknown/unenrolled router","routerId":"f8Uyt.ELeO","time":"2024-07-23T19:49:00.569Z"}
{"_context":"tls:0.0.0.0:6262","file":"github.com/openziti/channel/v2@v2.0.130/classic_listener.go:201","func":"github.com/openziti/channel/v2.(*classicListener).acceptConnection.func1","level":"error","msg":"connection handler error for [tls:10.42.0.1:22258] (unknown/unenrolled router, routerId: f8Uyt.ELeO)","time":"2024-07-23T19:49:00.569Z"}
{"file":"github.com/openziti/ziti/controller/handler_ctrl/connect.go:117","func":"github.com/openziti/ziti/controller/handler_ctrl.(*ConnectHandler).HandleConnection","level":"error","msg":"unknown/unenrolled router","routerId":"f8Uyt.ELeO","time":"2024-07-23T19:49:00.846Z"}
{"_context":"tls:0.0.0.0:6262","file":"github.com/openziti/channel/v2@v2.0.130/classic_listener.go:201","func":"github.com/openziti/channel/v2.(*classicListener).acceptConnection.func1","level":"error","msg":"connection handler error for [tls:10.42.0.1:54105] (unknown/unenrolled router, routerId: f8Uyt.ELeO)","time":"2024-07-23T19:49:00.846Z"}
{"file":"github.com/openziti/ziti/controller/handler_ctrl/connect.go:117","func":"github.com/openziti/ziti/controller/handler_ctrl.(*ConnectHandler).HandleConnection","level":"error","msg":"unknown/unenrolled router","routerId":"f8Uyt.ELeO","time":"2024-07-23T19:49:01.583Z"}
{"_context":"tls:0.0.0.0:6262","file":"github.com/openziti/channel/v2@v2.0.130/classic_listener.go:201","func":"github.com/openziti/channel/v2.(*classicListener).acceptConnection.func1","level":"error","msg":"connection handler error for [tls:10.42.0.1:52592] (unknown/unenrolled router, routerId: f8Uyt.ELeO)","time":"2024-07-23T19:49:01.583Z"}
{"file":"github.com/openziti/ziti/controller/handler_ctrl/connect.go:117","func":"github.com/openziti/ziti/controller/handler_ctrl.(*ConnectHandler).HandleConnection","level":"error","msg":"unknown/unenrolled router","routerId":"jwkfDvK3eO","time":"2024-07-23T19:49:01.855Z"}
{"_context":"tls:0.0.0.0:6262","file":"github.com/openziti/channel/v2@v2.0.130/classic_listener.go:201","func":"github.com/openziti/channel/v2.(*classicListener).acceptConnection.func1","level":"error","msg":"connection handler error for [tls:10.42.0.1:24034] (unknown/unenrolled router, routerId: jwkfDvK3eO)","time":"2024-07-23T19:49:01.856Z"}

router1 is public deployed in eks cluster and router2 is private in k3d

root@ip-172-31-26-52:/home/ubuntu# ziti edge list edge-routers
╭────────────┬─────────┬────────┬───────────────┬──────┬────────────╮
│ ID         │ NAME    │ ONLINE │ ALLOW TRANSIT │ COST │ ATTRIBUTES │
├────────────┼─────────┼────────┼───────────────┼──────┼────────────┤
│ N0FndZEL2O │ router2 │ true   │ true          │    0 │            │
│ TC00FDSge  │ router1 │ true   │ true          │    0 │            │
╰────────────┴─────────┴────────┴───────────────┴──────┴────────────╯
results: 1-2 of 2

The service type should be ClusterIP if you're using an Ingress Controller like ingress-nginx. In this case, the Ingress Controller obtains a LoadBalancer from the cloud provider (e.g., EKS), not the router's K8S service.

EDIT: The router's advertised host:port will also follow the way you choose to publish the service. If you choose LoadBalancer you must advertise the DNS name of the external address and port provisioned by the cloud provider. If you choose ClusterIP+Ingress Controller then the advertised address will be the DNS name and port 443/tcp (you must advertise port 443) of the Ingress Controller's LoadBalancer external address.

ok I choosed loadbalancer without ingress .and re-updated the host after getting loadbalancer uri

router logs

{"_context":"tls:0.0.0.0:10080","error":"EOF","file":"github.com/openziti/transport/v2@v2.0.133/tls/listener.go:257","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"172.31.4.18:56023","time":"2024-07-23T20:21:06.008Z"}
{"_channels":["establishPath"],"apiSessionId":"clyytdypj27sk0d5cwk1ssuq4","attempt":0,"attemptNumber":"1","binding":"tunnel","circuitId":"wCF.371AGS","context":"ch{ctrl}-\u003eu{reconnecting}-\u003ei{GAbl}","destination":"6kb3WtyDw1PtNTUHXtkieU","error":"error creating route for [c/wCF.371AGS]: dial tcp: lookup hello.hello-toy.svc on 10.100.0.10:53: no such host","file":"github.com/openziti/ziti/router/handler_ctrl/route.go:140","func":"github.com/openziti/ziti/router/handler_ctrl.(*routeHandler).fail","level":"error","msg":"failure while handling route update","serviceId":"68ei4yY8ZoE5VpbYlqLRdB","sessionId":"clyytf63d27u60d5cctj41pxj","time":"2024-07-23T20:21:10.246Z"}
{"_channels":["establishPath"],"apiSessionId":"clyytdypj27sk0d5cwk1ssuq4","attempt":1,"attemptNumber":"2","binding":"tunnel","circuitId":"wCF.371AGS","context":"ch{ctrl}-\u003eu{reconnecting}-\u003ei{GAbl}","destination":"6kb3WtyDw1PtNTUHXtkieU","error":"error creating route for [c/wCF.371AGS]: dial tcp: lookup hello.hello-toy.svc on 10.100.0.10:53: no such host","file":"github.com/openziti/ziti/router/handler_ctrl/route.go:140","func":"github.com/openziti/ziti/router/handler_ctrl.(*routeHandler).fail","level":"error","msg":"failure while handling route update","serviceId":"68ei4yY8ZoE5VpbYlqLRdB","sessionId":"clyytf63d27u60d5cctj41pxj","time":"2024-07-23T20:21:10.361Z"}
{"_context":"ch{edge}-\u003eu{classic}-\u003ei{YQKp}","chSeq":2,"connId":7,"edgeSeq":0,"error":"exceeded maximum [2] retries creating circuit [c/wCF.371AGS]: error creating route for [s/wCF.371AGS] on [r/vMYaNDSglQ] (error creating route for [c/wCF.371AGS]: dial tcp: lookup hello.hello-toy.svc on 10.100.0.10:53: no such host)","file":"github.com/openziti/ziti/router/xgress_edge/listener.go:199","func":"github.com/openziti/ziti/router/xgress_edge.(*edgeClientConn).processConnect","level":"warning","msg":"failed to dial fabric","time":"2024-07-23T20:21:10.474Z","token":"2bf149a2-f20d-43ce-a918-e8655f190557","type":"EdgeConnectType"}
{"_context":"tls:0.0.0.0:10080","error":"EOF","file":"github.com/openziti/transport/v2@v2.0.133/tls/listener.go:257","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"172.31.4.18:30221","time":"2024-07-23T20:21:15.537Z"}
{"_context":"tls:0.0.0.0:3022","error":"EOF","file":"github.com/openziti/transport/v2@v2.0.133/tls/listener.go:257","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"172.31.4.18:48059","time":"2024-07-23T20:21:15.642Z"}
{"_context":"tls:0.0.0.0:3022","error":"EOF","file":"github.com/openziti/transport/v2@v2.0.133/tls/listener.go:257","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"172.31.4.18:45208","time":"2024-07-23T20:21:15.703Z"}
{"_context":"tls:0.0.0.0:10080","error":"EOF","file":"github.com/openziti/transport/v2@v2.0.133/tls/listener.go:257","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"172.31.4.18:30674","time":"2024-07-23T20:21:16.006Z"}
{"_context":"tls:0.0.0.0:10080","error":"EOF","file":"github.com/openziti/transport/v2@v2.0.133/tls/listener.go:257","func":"github.com/openziti/transport/v2/tls.(*sharedListener).processConn","level":"error","msg":"handshake failed","remote":"172.31.4.18:27707","time":"2024-07-23T20:21:25.538Z"}
root@ip-172-31-26-52:/home/ubuntu# curl hello.ziti.internal
curl: (7) Failed to connect to hello.ziti.internal port 80 after 345 ms: Couldn't connect to server
root@ip-172-31-26-52:/home/ubuntu# curl hello.ziti.internal
curl: (7) Failed to connect to hello.ziti.internal port 80 after 346 ms: Couldn't connect to server
root@ip-172-31-26-52:/home/ubuntu# 

values.yaml

linkListeners:
  transport:  # https://docs.openziti.io/docs/reference/configuration/router/#transport
    containerPort: 10080
    advertisedHost: example.com
    advertisedPort: 10080
    service:
      enabled: true
      type: LoadBalancer
      labels:
      annotations:
    ingress:
      enabled: false
      ingressClassName: nginx
      annotations:
        kubernetes.io/ingress.allow-http: "false"
        nginx.ingress.kubernetes.io/ssl-passthrough: "true"

# listen for edge clients
edge:
  enabled: true
  containerPort: 3022
  advertisedHost: example.com
  advertisedPort: 3022
  service:
    enabled: true
    # -- expose the service as a ClusterIP, NodePort, or LoadBalancer
    type: LoadBalancer
    # -- service labels
    labels:
    # -- service annotations
    annotations:
  ingress:
    enabled: false
    ingressClassName: nginx
    annotations:
      kubernetes.io/ingress.allow-http: "false"
      nginx.ingress.kubernetes.io/ssl-passthrough: "true"

tunnel:
  mode: host

this is an entirely different issue and these are logs from the router. The logs are qutie helpful though and indicate your offload location does not exist. that'd make sense why one cluster worked while the other doesn't... You just don't have the hello-toy example deployed.

Confirm the router can access the internal port 80 of hello.ziti.internal and I'd expect it to work

router2 (private) can access hello.ziti.internal but not router1 and that should work . As @qrkourier said me

can you just verify using curl or wget from router2, that it can indeed access http://hello.ziti.internal:80?

and when you wrote "router logs" -- i expect that MUST be router 2's logs, right?

Thanks for sharing your router Helm release's values. They show that you chose ClusterIP and with ingress, not LoadBalancer without ingress. Can you confirm?

EDIT: Now I see the second YAML values snippet that shows the configuration you reported with LoadBalancer, without ingress. I think you were sharing before and after. :+1:

yes its LoadBalancer without ingress.

Is this right, @sadath-12? Router1 is public and router2 is the private router that has exclusive permission to bind the hello service. This is important because if router1 also has bind permission, but can not reach the target URL, then you will get errors some of the time when the incorrect router1 is actively providing the hello service. Ensure only the private router2 in K3D has permission to bind the hello service.

ya in my case both has permission to bind to the service . let me change and try

ok , I changed the attribute of router1 to hello-hosts1 which can't bind to any service . But still same issue

root@ip-172-31-26-52:/home/ubuntu# ziti edge list terminators
╭────────────────────────┬─────────────────┬─────────┬─────────┬────────────────────────┬──────────┬──────┬────────────┬──────────────╮
│ ID                     │ SERVICE         │ ROUTER  │ BINDING │ ADDRESS                │ IDENTITY │ COST │ PRECEDENCE │ DYNAMIC COST │
├────────────────────────┼─────────────────┼─────────┼─────────┼────────────────────────┼──────────┼──────┼────────────┼──────────────┤
│ 1Hu8q1mMNXAVJX8TYYMIhc │ hello-service   │ router2 │ tunnel  │ 1Hu8q1mMNXAVJX8TYYMIhc │          │    0 │ default    │            0 │
│ 5vqKyM1t5pkkGnW7fytHV4 │ router2-service │ router2 │ tunnel  │ 5vqKyM1t5pkkGnW7fytHV4 │          │    0 │ default    │            0 │
│ 6kb3WtyDw1PtNTUHXtkieU │ hello-service   │ router1 │ tunnel  │ 6kb3WtyDw1PtNTUHXtkieU │          │    0 │ default    │            0 │
│ 7cghzm91345EwcxXv1VJDE │ router2-service │ router1 │ tunnel  │ 7cghzm91345EwcxXv1VJDE │          │    0 │ default    │            0 │
╰────────────────────────┴─────────────────┴─────────┴─────────┴────────────────────────┴──────────┴──────┴────────────┴──────────────╯
results: 1-4 of 4
root@ip-172-31-26-52:/home/ubuntu# ziti edge update identity "router1" \
    --role-attributes hello-hosts1
root@ip-172-31-26-52:/home/ubuntu# ziti edge list terminators^C
root@ip-172-31-26-52:/home/ubuntu# ziti edge list identities
╭────────────┬───────────────┬─────────┬───────────────┬─────────────╮
│ ID         │ NAME          │ TYPE    │ ATTRIBUTES    │ AUTH-POLICY │
├────────────┼───────────────┼─────────┼───────────────┼─────────────┤
│ 79APmvE3e  │ hello-client  │ Default │ hello-clients │ Default     │
│ N0FndZEL2O │ router2       │ Router  │ hello-hosts   │ Default     │
│ NG0IxMbwa  │ Default Admin │ Default │               │ Default     │
│ vMYaNDSglQ │ router1       │ Router  │ hello-hosts1  │ Default     │
│ xT5D3DSgeQ │ hello-client2 │ Default │ hello-clients │ Default     │
╰────────────┴───────────────┴─────────┴───────────────┴─────────────╯
results: 1-5 of 5
root@ip-172-31-26-52:/home/ubuntu# curl hello.ziti.internal
curl: (7) Failed to connect to hello.ziti.internal port 80 after 116 ms: Couldn't connect to server
root@ip-172-31-26-52:/home/ubuntu# ziti edge list terminators
╭────────────────────────┬─────────────────┬─────────┬─────────┬────────────────────────┬──────────┬──────┬────────────┬──────────────╮
│ ID                     │ SERVICE         │ ROUTER  │ BINDING │ ADDRESS                │ IDENTITY │ COST │ PRECEDENCE │ DYNAMIC COST │
├────────────────────────┼─────────────────┼─────────┼─────────┼────────────────────────┼──────────┼──────┼────────────┼──────────────┤
│ 1Hu8q1mMNXAVJX8TYYMIhc │ hello-service   │ router2 │ tunnel  │ 1Hu8q1mMNXAVJX8TYYMIhc │          │    0 │ default    │            0 │
│ 5vqKyM1t5pkkGnW7fytHV4 │ router2-service │ router2 │ tunnel  │ 5vqKyM1t5pkkGnW7fytHV4 │          │    0 │ default    │            0 │
╰────────────────────────┴─────────────────┴─────────┴─────────┴────────────────────────┴──────────┴──────┴────────────┴──────────────╯
results: 1-2 of 2

It makes me wonder if there was a delay for the terminator change to be noticed by the tunneler, or perhaps a bug that we can help to discover by restarting the tunneler or router or both. Can you try again before and after restarting tunneler and each router?

That works ,

I restarted tunneler and public router it didnt work , then I went ahead and restarted private router it worked .

Please tell what was the issue and how to avoid this in future

Thank you for methodically testing and restarting each separately! That's helpful. It could be a bug and I will use your report to try to recreate the problem. The private router's terminator should have become effective when the invalid terminator was deleted.

ok , I tried to create a new controller and test the same scenario , after I deploy public router in eks and untill I restart private router pod , I cant send a request to the service .

given this time I deployed public router properly without attributes

It sounds like you're creating a new EKS cluster with a new Ziti controller and new PKI each time. Is that accurate? If so, I assume you have already enrolled the private router with a token from the new controller before deleting the invalid terminator.

This time , I quickly created new controller in k3d , installed public router in eks and then private in k3d again (using updated tokens from new controller) . They work .

Then If I delete public router and recreate again it does not work untill I restart the private router.

So to conclude here private router cant discover new public routers by itself untill restarted

1 Like

That is a valuable insight for me to test.

Thank you for clarifying your K3D cluster, too, is public because it is running the Ziti controller, and EKS is providing the public router.