EKS Cluster Router :"failed to dial fabric

Would you mind posting the output of ziti fabric list links and ziti fabric list routers so we can compare the ids. That error message, as @TheLumberjack said, is because it couldn't find a path, so want to make sure that the ids in the error message line up with the router ids in the links.

thank you,
Paul

thanks paul. also, you have BOTH clusters deployed? and "router1" doesn't work? All the results you've shared so far appear to be for "router2"?

Ignore router4 (which I tested and deleted)

root@ip-172-31-21-248:/home/ubuntu# ziti fabric list links
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ ID                     โ”‚ DIALER  โ”‚ ACCEPTOR โ”‚ STATIC COST โ”‚ SRC LATENCY โ”‚ DST LATENCY โ”‚ STATE     โ”‚ STATUS โ”‚ FULL COST โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 3oJWCr7FA5sCDY0jVPqtCA โ”‚ router2 โ”‚ router1  โ”‚           1 โ”‚       1.6ms โ”‚       2.8ms โ”‚ Connected โ”‚     up โ”‚         4 โ”‚
โ”‚ 5NoOiqQc2kDtfO9kRJIYq0 โ”‚ router2 โ”‚ router4  โ”‚           1 โ”‚     113.2ms โ”‚     113.5ms โ”‚ Connected โ”‚     up โ”‚       227 โ”‚
root@ip-172-31-21-248:/home/ubuntu# ziti fabric list routers
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ ID         โ”‚ NAME    โ”‚ ONLINE โ”‚ COST โ”‚ NO TRAVERSAL โ”‚ DISABLED โ”‚ VERSION               โ”‚ LISTENERS                              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ N0FndZEL2O โ”‚ router2 โ”‚ true   โ”‚    0 โ”‚ false        โ”‚ false    โ”‚ v1.1.3 on linux/amd64 โ”‚                                        โ”‚
โ”‚ OSkP5ZEL2O โ”‚ router1 โ”‚ true   โ”‚    0 โ”‚ false        โ”‚ false    โ”‚ v1.1.3 on linux/amd64 โ”‚ 1: tls:routerlistener.domain.co:443 โ”‚

this is the scenario

Thanks for all the results... Based on the output you've shown so far, I would expect things to be working. You have terminators, your identity has dial access, your router has bind access and you have links formed...

I suspect we'll need more logs from the controller and or router to know why it's not routing properly. At this point, since we have Paul commenting I'll bow out. He's extremely close to the routing and will have better questions to ask. I'll keep thinking about it though in case I come up with anything else to check.

You're not reusing the private router name and you're not copying the pki or something like that, right?

appreciate the efforts @TheLumberjack .
Nope , I deployed private router only once and pki I dint customize .

Seems like some way public router (router1) does not connect with controller

You show the routers linked, which would have had to have come from the router communicating with the controller so that seems unlikely.

It seems more likely that the controller is somehow not able to calculate the path. Are there more controller logs you can share?

Router logs

root@ip-172-31-38-32:/home/ubuntu# k logs ziti-router-59987c8bd9-k5km5 -n ziti --tail=10
{"_context":"ch{edge}-\u003eu{classic}-\u003ei{1V7J}","chSeq":2,"connId":19,"edgeSeq":0,"error":"can't route from OSkP5ZEL2O -\u003e N0FndZEL2O","file":"github.com/openziti/ziti/router/xgress_edge/listener.go:199","func":"github.com/openziti/ziti/router/xgress_edge.(*edgeClientConn).processConnect","level":"warning","msg":"failed to dial fabric","time":"2024-07-19T18:14:10.588Z","token":"68e0b72b-ed86-48c7-a792-633a7eefd7ea","type":"EdgeConnectType"}
{"file":"github.com/openziti/ziti/router/link/link_state.go:97","func":"github.com/openziti/ziti/router/link.(*linkState).updateStatus","iteration":4,"key":"default-\u003etls:zkyVEZE3e-\u003edefault","level":"info","linkId":"42gd5cVkpaDA0VZNFYX8uY","msg":"status updated","newState":"destRemoved","oldState":"dialFailed","time":"2024-07-19T18:16:33.184Z"}
{"file":"github.com/openziti/ziti/router/link/link_state.go:97","func":"github.com/openziti/ziti/router/link.(*linkState).updateStatus","iteration":4,"key":"default-\u003etls:zkyVEZE3e-\u003edefault","level":"info","linkId":"42gd5cVkpaDA0VZNFYX8uY","msg":"status updated","newState":"dialing","oldState":"destRemoved","time":"2024-07-19T18:21:45.490Z"}
{"file":"github.com/openziti/ziti/router/link/link_registry.go:463","func":"github.com/openziti/ziti/router/link.(*linkRegistryImpl).evaluateLinkState","iteration":5,"key":"default-\u003etls:zkyVEZE3e-\u003edefault","level":"info","linkId":"42gd5cVkpaDA0VZNFYX8uY","msg":"queuing link to dial","time":"2024-07-19T18:21:45.490Z"}
{"file":"github.com/openziti/ziti/router/link/link_registry.go:475","func":"github.com/openziti/ziti/router/link.(*linkRegistryImpl).evaluateLinkState.func1","iteration":5,"key":"default-\u003etls:zkyVEZE3e-\u003edefault","level":"info","linkId":"42gd5cVkpaDA0VZNFYX8uY","msg":"dialing link","time":"2024-07-19T18:21:45.490Z"}
{"connId":"ce69aeb5-9192-4b5a-9c70-d1e53b25ece8","file":"github.com/openziti/ziti/router/xlink_transport/dialer.go:100","func":"github.com/openziti/ziti/router/xlink_transport.(*dialer).dialSplit","level":"info","linkId":"42gd5cVkpaDA0VZNFYX8uY","msg":"dialing link with split payload/ack channels","time":"2024-07-19T18:21:45.490Z"}
{"connId":"ce69aeb5-9192-4b5a-9c70-d1e53b25ece8","file":"github.com/openziti/ziti/router/xlink_transport/dialer.go:113","func":"github.com/openziti/ziti/router/xlink_transport.(*dialer).dialSplit","level":"info","linkId":"42gd5cVkpaDA0VZNFYX8uY","msg":"dialing payload channel","time":"2024-07-19T18:21:45.490Z"}
{"error":"error dialing outgoing link [l/42gd5cVkpaDA0VZNFYX8uY@5]: error dialing payload channel for [l/42gd5cVkpaDA0VZNFYX8uY]: read tcp 172.31.7.44:45648-\u003e172.31.21.248:10080: read: connection reset by peer","file":"github.com/openziti/ziti/router/link/link_registry.go:478","func":"github.com/openziti/ziti/router/link.(*linkRegistryImpl).evaluateLinkState.func1","iteration":5,"key":"default-\u003etls:zkyVEZE3e-\u003edefault","level":"error","linkId":"42gd5cVkpaDA0VZNFYX8uY","msg":"error dialing link","time":"2024-07-19T18:21:45.506Z"}
{"file":"github.com/openziti/ziti/router/link/link_state.go:97","func":"github.com/openziti/ziti/router/link.(*linkState).updateStatus","iteration":5,"key":"default-\u003etls:zkyVEZE3e-\u003edefault","level":"info","linkId":"42gd5cVkpaDA0VZNFYX8uY","msg":"status updated","newState":"dialFailed","oldState":"dialing","time":"2024-07-19T18:21:45.506Z"}
{"_context":"ch{edge}-\u003eu{classic}-\u003ei{5DmB}","chSeq":2,"connId":20,"edgeSeq":0,"error":"can't route from OSkP5ZEL2O -\u003e N0FndZEL2O","file":"github.com/openziti/ziti/router/xgress_edge/listener.go:199","func":"github.com/openziti/ziti/router/xgress_edge.(*edgeClientConn).processConnect","level":"warning","msg":"failed to dial fabric","time":"2024-07-19T18:50:01.443Z","token":"68e0b72b-ed86-48c7-a792-633a7eefd7ea","type":"EdgeConnectType"}
root@ip-172-31-38-32:/home/ubuntu# 

Controller logs

(devbox) root@ip-172-31-25-202:/home/ubuntu# k logs ziti-controller-85cf98988-m9mfr -n ziti --tail=10
Defaulted container "ziti-controller" out of: ziti-controller, ziti-controller-init (init)
{"file":"github.com/openziti/ziti/controller/handler_ctrl/connect.go:117","func":"github.com/openziti/ziti/controller/handler_ctrl.(*ConnectHandler).HandleConnection","level":"error","msg":"unknown/unenrolled router","routerId":"f8Uyt.ELeO","time":"2024-07-19T18:51:32.597Z"}
{"_context":"tls:0.0.0.0:6262","file":"github.com/openziti/channel/v2@v2.0.130/classic_listener.go:201","func":"github.com/openziti/channel/v2.(*classicListener).acceptConnection.func1","level":"error","msg":"connection handler error for [tls:10.42.0.1:60877] (unknown/unenrolled router, routerId: f8Uyt.ELeO)","time":"2024-07-19T18:51:32.597Z"}
{"file":"github.com/openziti/ziti/controller/handler_ctrl/connect.go:117","func":"github.com/openziti/ziti/controller/handler_ctrl.(*ConnectHandler).HandleConnection","level":"error","msg":"unknown/unenrolled router","routerId":"jwkfDvK3eO","time":"2024-07-19T18:51:37.612Z"}
{"_context":"tls:0.0.0.0:6262","file":"github.com/openziti/channel/v2@v2.0.130/classic_listener.go:201","func":"github.com/openziti/channel/v2.(*classicListener).acceptConnection.func1","level":"error","msg":"connection handler error for [tls:10.42.0.1:33717] (unknown/unenrolled router, routerId: jwkfDvK3eO)","time":"2024-07-19T18:51:37.612Z"}
{"file":"github.com/openziti/ziti/controller/handler_ctrl/connect.go:117","func":"github.com/openziti/ziti/controller/handler_ctrl.(*ConnectHandler).HandleConnection","level":"error","msg":"unknown/unenrolled router","routerId":"f8Uyt.ELeO","time":"2024-07-19T18:51:37.631Z"}
{"_context":"tls:0.0.0.0:6262","file":"github.com/openziti/channel/v2@v2.0.130/classic_listener.go:201","func":"github.com/openziti/channel/v2.(*classicListener).acceptConnection.func1","level":"error","msg":"connection handler error for [tls:10.42.0.1:47031] (unknown/unenrolled router, routerId: f8Uyt.ELeO)","time":"2024-07-19T18:51:37.631Z"}
{"file":"github.com/openziti/ziti/controller/handler_ctrl/connect.go:117","func":"github.com/openziti/ziti/controller/handler_ctrl.(*ConnectHandler).HandleConnection","level":"error","msg":"unknown/unenrolled router","routerId":"jwkfDvK3eO","time":"2024-07-19T18:51:42.646Z"}
{"_context":"tls:0.0.0.0:6262","file":"github.com/openziti/channel/v2@v2.0.130/classic_listener.go:201","func":"github.com/openziti/channel/v2.(*classicListener).acceptConnection.func1","level":"error","msg":"connection handler error for [tls:10.42.0.1:37673] (unknown/unenrolled router, routerId: jwkfDvK3eO)","time":"2024-07-19T18:51:42.646Z"}
{"file":"github.com/openziti/ziti/controller/handler_ctrl/connect.go:117","func":"github.com/openziti/ziti/controller/handler_ctrl.(*ConnectHandler).HandleConnection","level":"error","msg":"unknown/unenrolled router","routerId":"f8Uyt.ELeO","time":"2024-07-19T18:51:42.666Z"}
{"_context":"tls:0.0.0.0:6262","file":"github.com/openziti/channel/v2@v2.0.130/classic_listener.go:201","func":"github.com/openziti/channel/v2.(*classicListener).acceptConnection.func1","level":"error","msg":"connection handler error for [tls:10.42.0.1:22951] (unknown/unenrolled router, routerId: f8Uyt.ELeO)","time":"2024-07-19T18:51:42.666Z"}
(devbox) root@ip-172-31-25-202:/home/ubuntu# 

because only difference between working and non-working environment in our case is public router been deployed in eks or k3d . In k3d works but not in eks . So I assume something is blocking it .

I checked security group and I have allowed all traffic from everywhere

Yea, that was my initial thought as well. I also thought perhaps some policies were not getting applied correctly based on what you were doing, but that doesn't appear to be the case.

I would like to see more controller logs, particularly from "just before you connect" to the service to after the connections are attempted.

Can you start a tail, attempt to access the service, then capture all those logs?

yes , I posted the above logs just after I did a curl request to ziti domain

Yes but you only pulled 10 lines and it's after the curl potentially missing all the stuff at the beginning of the dialing.

That's why I asked to start the tail, make the request, stop the trail and send us those logs. Don't just tail -10

I have encountered the same "failed to dial fabric" error when the service edge router policy (SERP) is too restrictive. This is especially likely to occur when using a router-tunneler for dialing or binding a Ziti service.

Do you currently have a default #all/#all SERP?

ziti edge list service-edge-router-policies

relevant GitHub issue: Update policy advisor to look for cases where tunneler identity has access to a service but service doesn't have access to router ยท Issue #1366 ยท openziti/ziti ยท GitHub

if policies were issue , then how come the same router in k3d is working ? @qrkourier

yes i do have default #all for them 

(devbox) root@ip-172-31-25-202:/home/ubuntu# ziti edge list service-edge-router-policies
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ ID                    โ”‚ NAME    โ”‚ SERVICE ROLES โ”‚ EDGE ROUTER ROLES โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ xkjXoIdhq6jdZ38e5nvce โ”‚ default โ”‚ #all          โ”‚ #all              โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
results: 1-1 of 1

Policy issues were what I was working you through. Things seemed ok to me.

Can you please, start the tail, reproduce the problem, then collect the controller logs and send them? As of now, we have no way to reproduce this problem and not enough logs to have a clue as to what might be going on.

I'm hoping your tail from before reproducing the issue will help. If possible, enable --verbose logging

These controller logs @sadath-12 sent seem to indicate there's a problem with the enrollment of two different router IDs. Do you still have these routers?

ziti edge list edge-routers 'id="f8Uyt.ELeO"'
ziti edge list edge-routers 'id="jwkfDvK3eO"'

Maybe a router without an edge enrollment can form a link but can't interact with an edge service normally.

I deleted the routers . Can i reproduce the problem in a call sharing my screen please ?

Let's finish documenting the issue.

You can enable DEBUG log level for the controller and router each by setting Helm value image.additionalArgs="--verbose".

Then we need to see the most relevant log messages during the time frame that the problem recurs, at the moment you attempt to use the Ziti service.

For example,

kubectl --namespace ziti logs --selector app.kubernetes.io/name=ziti-controller --container=ziti-controller --follow --tail=-1
kubectl --namespace ziti logs --selector app.kubernetes.io/name=ziti-router --follow --tail=-1

sure , trying out in new eks cluster