Private router not connecting to fabric

My netwok consists of a private router, a public router hosted by netfoundry and various endpoints. My internet connection is satellite with latency of about 700ms. Everything was working well up until about a week ago. I lost connection between endpoints both when my laptop was on the local network and on a remote network. The netfoundry console is showing all endpoints online. I’m seeing errors in my private router - looks like its not connecting to the public router:

May 23 09:16:48 antlet19 ziti[11151]: {"_context":"tls:1cbcdfd4-c7ac-4be9-8f2e-1075845c774c.production.netfoundry.io:6262","error":"EOF","file":"github.com/openziti/channel/v2@v2.0.27/classic_dialer.go:69","func":"github.com/openziti/channel/v2.(*classicDialer).Create","level":"warning","msg":"error initiating channel with hello","time":"2023-05-23T09:16:48.008Z"}
May 23 09:16:48 antlet19 ziti[11151]: {"file":"github.com/openziti/channel/v2@v2.0.27/message.go:649","func":"github.com/openziti/channel/v2.getRetryVersionFor","level":"info","msg":"defaulting to version 2","time":"2023-05-23T09:16:48.008Z"}
May 23 09:16:48 antlet19 ziti[11151]: {"_context":"tls:1cbcdfd4-c7ac-4be9-8f2e-1075845c774c.production.netfoundry.io:6262","file":"github.com/openziti/channel/v2@v2.0.27/classic_dialer.go:73","func":"github.com/openziti/channel/v2.(*classicDialer).Create","level":"warning","msg":"Retrying dial with protocol version 2","time":"2023-05-23T09:16:48.008Z"}
May 23 09:16:49 antlet19 ziti[11151]: {"_channels":["link","linkDialer"],"address":"tls:1cbcdfd4-c7ac-4be9-8f2e-1075845c774c.production.netfoundry.io:6262","error":"error dialing outgoing link [l/3HDIaAoRqUbV9ZEQKOtDyn]: error dialing payload channel for [l/3HDIaAoRqUbV9ZEQKOtDyn]: EOF","file":"github.com/openziti/fabric@v0.22.24/router/handler_ctrl/dial.go:109","func":"github.com/openziti/fabric/router/handler_ctrl.(*dialHandler).handle","level":"error","linkId":"3HDIaAoRqUbV9ZEQKOtDyn","linkProtocol":"tls","msg":"link dialing failed","routerId":"gBxvTOQ9cd","routerVersion":"v0.27.5","time":"2023-05-23T09:16:49.716Z"}

I’m also seeing errors in the laptop log, but these may be the result of the private router being offline:

May 23 07:16:51 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   124761.389]   ERROR ziti-sdk:ziti_ctrl.c:155 ctrl_resp_cb() ctrl[1cbcdfd4-c7ac-4be9-8f2e-1075845c774c.production.netfoundry.io] request failed: -103(software caused connection abort)
May 23 07:16:51 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   124761.389]   ERROR ziti-sdk:ziti_ctrl.c:155 ctrl_resp_cb() ctrl[1cbcdfd4-c7ac-4be9-8f2e-1075845c774c.production.netfoundry.io] request failed: -103(software caused connection abort)
May 23 07:16:51 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   124761.389]   ERROR ziti-sdk:ziti.c:1052 update_services() ztx[0] failed to get service updates err[CONTROLLER_UNAVAILABLE/software caused connection abort] from ctrl[https://1cbcdfd4-c7ac-4be9-8f2e-1075845c774c.production.netfoundry.io:443]
May 23 07:16:51 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   124761.389]    WARN tunnel-cbs:ziti_tunnel_ctrl.c:739 on_ziti_event() ziti_ctx controller connections failed: Ziti Controller is not available
May 23 07:16:51 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   124761.389]   ERROR ziti-edge-tunnel:ziti-edge-tunnel.c:1199 on_event() ztx[/opt/openziti/etc/identities/laptop.json] failed to connect to controller due to Ziti Controller is not available
May 23 08:05:03 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   127653.568]   ERROR ziti-sdk:ziti_ctrl.c:155 ctrl_resp_cb() ctrl[1cbcdfd4-c7ac-4be9-8f2e-1075845c774c.production.netfoundry.io] request failed: -104(connection reset by peer)
May 23 08:05:03 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   127653.568]   ERROR ziti-sdk:ziti_ctrl.c:155 ctrl_resp_cb() ctrl[1cbcdfd4-c7ac-4be9-8f2e-1075845c774c.production.netfoundry.io] request failed: -104(connection reset by peer)
May 23 08:05:03 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   127653.568]   ERROR ziti-sdk:ziti.c:1052 update_services() ztx[0] failed to get service updates err[CONTROLLER_UNAVAILABLE/connection reset by peer] from ctrl[https://1cbcdfd4-c7ac-4be9-8f2e-1075845c774c.production.netfoundry.io:443]
May 23 08:05:03 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   127653.568]    WARN tunnel-cbs:ziti_tunnel_ctrl.c:739 on_ziti_event() ziti_ctx controller connections failed: Ziti Controller is not available
May 23 08:05:03 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   127653.568]   ERROR ziti-edge-tunnel:ziti-edge-tunnel.c:1199 on_event() ztx[/opt/openziti/etc/identities/laptop.json] failed to connect to controller due to Ziti Controller is not available
May 23 08:05:07 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   127658.111]   ERROR ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED71] conn_id[312]
May 23 08:22:09 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   128679.124]    WARN ziti-sdk:connect.c:351 connect_timeout() conn[0.313/Connecting] failed to establish connection in 10000ms on ch[1]
May 23 08:22:09 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   128679.124]   ERROR tunnel-cbs:ziti_tunnel_cbs.c:103 on_ziti_connect() ziti dial failed: Operation did not complete in time
May 23 08:22:12 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   128683.076]   ERROR ziti-sdk:channel.c:467 dispatch_message() ch[1] could not find waiter for reply_to = 146
May 23 08:22:12 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   128683.076]   ERROR ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED70] conn_id[313]
May 23 08:22:12 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   128683.076]   ERROR ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED72] conn_id[313]
May 23 08:33:18 ThinkPad-SL510 ziti-edge-tunnel[1611]: (1611)[   129348.831]   ERROR ziti-sdk:channel.c:489 dispatch_message() ch[1] received message without conn_id or for unknown connection ct[ED71] conn_id[318]

Suggestions on further troubleshooting would be welcome.

Hmmmm. If you can’t connect to the controller, that definitely causes problems for the overlay. The controller is required in order to authenticate, generate a path/circuit etc. so it’s vital that it’s able to be accessed. I am able to connect to your controller at this time, are you able to run this from the router:

openssl s_client -quiet -connect 1cbcdfd4-c7ac-4be9-8f2e-1075845c774c.production.netfoundry.io:443

You should see:

depth=2 CN = e055e14e-7a37-4e24-8d62-260a6be2566f, O = NetFoundry, L = Charlotte, ST = NC, C = US
verify error:num=19:self-signed certificate in certificate chain
verify return:1
depth=2 CN = e055e14e-7a37-4e24-8d62-260a6be2566f, O = NetFoundry, L = Charlotte, ST = NC, C = US
verify return:1
depth=1 C = US, ST = NC, L = Charlotte, O = NetFoundry, CN = Ziti Controller Intermediate CA, emailAddress = support@netfoundry.io
verify return:1
depth=0 C = US, ST = NC, L = Charlotte, O = NetFoundry, OU = AdvDev, CN = 141.148.6.137
verify return:1

I’ll see if I can get onto the controller to see logs and I’ll try to engage the cloudziti team too and let them know this is happening to see if any others have had this issue

From the router it looks like what you expected.

andrew@antlet19:~$ openssl s_client -quiet -connect 1cbcdfd4-c7ac-4be9-8f2e-1075845c774c.production.netfoundry.io:443
depth=2 CN = e055e14e-7a37-4e24-8d62-260a6be2566f, O = NetFoundry, L = Charlotte, ST = NC, C = US
verify error:num=19:self signed certificate in certificate chain
verify return:1
depth=2 CN = e055e14e-7a37-4e24-8d62-260a6be2566f, O = NetFoundry, L = Charlotte, ST = NC, C = US
verify return:1
depth=1 C = US, ST = NC, L = Charlotte, O = NetFoundry, CN = Ziti Controller Intermediate CA, emailAddress = support@netfoundry.io
verify return:1
depth=0 C = US, ST = NC, L = Charlotte, O = NetFoundry, OU = AdvDev, CN = 141.148.6.137
verify return:1

Same with the laptop

andrew@ThinkPad-SL510:~$ openssl s_client -quiet -connect 1cbcdfd4-c7ac-4be9-8f2e-1075845c774c.production.netfoundry.io:443
depth=2 CN = e055e14e-7a37-4e24-8d62-260a6be2566f, O = NetFoundry, L = Charlotte, ST = NC, C = US
verify error:num=19:self-signed certificate in certificate chain
verify return:1
depth=2 CN = e055e14e-7a37-4e24-8d62-260a6be2566f, O = NetFoundry, L = Charlotte, ST = NC, C = US
verify return:1
depth=1 C = US, ST = NC, L = Charlotte, O = NetFoundry, CN = Ziti Controller Intermediate CA, emailAddress = support@netfoundry.io
verify return:1
depth=0 C = US, ST = NC, L = Charlotte, O = NetFoundry, OU = AdvDev, CN = 141.148.6.137
verify return:1

As a NetFoundry user, you can open a ticket via support@netfoundry.io, and this will allow us to exchange full logs, etc. There may be additional information we can find in the full logs, as well as correlate timestamps, etc. with our internal logs.