Ziti-router fatal error on startup

After successfully building the openziti/ziti project on this armv6 device, a Raspi Zero v1.3, I was also able to run ziti-router and successfully enrolled and verified that the expected certificates were generated.

However, when I attempted ziti-router run I always get this error:

{
  "error": "error connecting ctrl (context deadline exceeded)",
  "file": "/home/kb/ziti/ziti-router/subcmd/run.go:119",
  "func": "github.com/openziti/ziti/ziti-router/subcmd.run",
  "level": "fatal",
  "msg": "error starting",
  "time": "2022-05-31T04:29:35.818Z"
}

Hypothesis: RESTY, the Go library, is configured with a max timeout for REST requests, and this device requires more time.

Here is the full startup log.

kb@kpizero1:~/ziti-router $ sudo ./ziti-router run ./config.yml --verbose |& tee /tmp/ziti-router-raspi-zero-fatal-startup.debug
{"file":"/home/kb/ziti/ziti/cmd/ziti/util/updates.go:48","func":"github.com/openziti/ziti/ziti/cmd/ziti/util.LogReleaseVersionCheck","level":"debug","msg":"this build of ziti-router is unreleased v0.0.0","time":"2022-05-31T04:29:34.210Z"}
{"arch":"arm","build-date":"2020-01-01 01:01:01","configFile":"./config.yml","file":"/home/kb/ziti/ziti-router/subcmd/run.go:75","func":"github.com/openziti/ziti/ziti-router/subcmd.run","go-version":"go1.18.2","level":"info","msg":"starting ziti-router","os":"linux","revision":"local","routerId":"MoXZ8iOp3Y","time":"2022-05-31T04:29:34.278Z","version":"v0.0.0"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/edge@v0.21.222/router/internal/edgerouter/config.go:139","func":"github.com/openziti/edge/router/internal/edgerouter.(*Config).LoadConfigFromMap","level":"warning","msg":"Invalid heartbeat interval [0] (min: 60, max: 10), setting to default [60]","time":"2022-05-31T04:29:34.288Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/router/forwarder/faulter.go:64","func":"github.com/openziti/fabric/router/forwarder.(*Faulter).run","level":"info","msg":"started","time":"2022-05-31T04:29:34.296Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/router/router.go:266","func":"github.com/openziti/fabric/router.(*Router).showOptions","level":"info","msg":"ctrl = {\"OutQueueSize\":4,\"MaxQueuedConnects\":1,\"MaxOutstandingConnects\":16,\"ConnectTimeout\":1000000000,\"DelayRxStart\":false,\"WriteTimeout\":0}","time":"2022-05-31T04:29:34.324Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/router/router.go:272","func":"github.com/openziti/fabric/router.(*Router).showOptions","level":"info","msg":"metrics = {\"ReportInterval\":60000000000,\"MessageQueueSize\":10}","time":"2022-05-31T04:29:34.361Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/edge@v0.21.222/router/internal/apiproxy/proxy.go:45","func":"github.com/openziti/edge/router/internal/apiproxy.Start","level":"debug","msg":"API Proxy disabled","time":"2022-05-31T04:29:34.321Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/router/router.go:471","func":"github.com/openziti/fabric/router.(*Router).initializeHealthChecks","level":"info","msg":"starting health check with ctrl ping initially after 15s, then every 30s, timing out after 15s","time":"2022-05-31T04:29:34.390Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/router/forwarder/scanner.go:58","func":"github.com/openziti/fabric/router/forwarder.(*Scanner).run","level":"info","msg":"started","time":"2022-05-31T04:29:34.296Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/xweb/registry.go:45","func":"github.com/openziti/fabric/xweb.WebHandlerFactoryRegistryImpl.Add","level":"debug","msg":"adding xweb factory with binding: health-checks","time":"2022-05-31T04:29:34.410Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/router/router.go:350","func":"github.com/openziti/fabric/router.(*Router).startXlinkDialers","level":"info","msg":"started Xlink dialer with binding [transport]","time":"2022-05-31T04:29:34.442Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/edge@v0.21.222/router/xgress_edge/factory.go:168","func":"github.com/openziti/edge/router/xgress_edge.(*Factory).CreateListener","level":"debug","msg":"xgress edge listener options: mtu=65536\nrandomDrops=false\ndrop1InN=100\ntxQueueSize=1\ntxPortalStartSize=16384\ntxPortalMaxSize=4194304\ntxPortalMinSize=16384\ntxPortalIncreaseThresh=224\ntxPortalIncreaseScale=1\ntxPortalRetxThresh=64\ntxPortalRetxScale=0.75\ntxPortalDupAckThresh=64\ntxPortalDupAckScale=0.9\nrxBufferSize=4194304\nretxStartMs=200\nretxScale=1.5\nretxAddMs=0\nmaxCloseWait=30s\ngetCircuitTimeout=30s\nlookupApiSessionTimeout=5s\nlookupSessionTimeout=5s\nchannel.outQueueSize=4\nchannel.connectTimeout=1s\nchannel.maxOutstandingConnects=16\nchannel.maxQueuedConnects=1\n","time":"2022-05-31T04:29:34.451Z"}
{"address":{},"file":"/home/kb/go/pkg/mod/github.com/openziti/edge@v0.21.222/router/xgress_edge/listener.go:69","func":"github.com/openziti/edge/router/xgress_edge.(*listener).Listen","level":"info","msg":"starting channel listener","time":"2022-05-31T04:29:34.465Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/metrics/pool_metrics.go:50","func":"github.com/openziti/fabric/metrics.GoroutinesPoolMetricsConfigF.func1.1","idleTime":10000000000,"level":"info","maxQueueSize":1,"maxWorkers":16,"minWorkers":1,"msg":"starting goroutine pool","poolType":"pool.listener.xgress_edge","time":"2022-05-31T04:29:34.494Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/edge@v0.21.222/router/xgress_edge/accept.go:108","func":"github.com/openziti/edge/router/xgress_edge.(*Acceptor).Run","level":"info","msg":"starting","time":"2022-05-31T04:29:34.516Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/router/router.go:399","func":"github.com/openziti/fabric/router.(*Router).startXgressListeners","level":"info","msg":"created xgress listener [edge] at [tls:0.0.0.0:443]","time":"2022-05-31T04:29:34.513Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/router/router.go:399","func":"github.com/openziti/fabric/router.(*Router).startXgressListeners","level":"info","msg":"created xgress listener [tunnel] at []","time":"2022-05-31T04:29:34.523Z"}
{"_context":"tls:50a2078f-50d2-4ae0-ab87-0a7c2359ec96.production.netfoundry.io:80","file":"/home/kb/go/pkg/mod/github.com/openziti/channel@v0.18.37/reconnecting_dialer.go:68","func":"github.com/openziti/channel.(*reconnectingDialer).Create","level":"debug","msg":"started","time":"2022-05-31T04:29:34.536Z"}
{"_context":"tls:50a2078f-50d2-4ae0-ab87-0a7c2359ec96.production.netfoundry.io:80","file":"/home/kb/go/pkg/mod/github.com/openziti/channel@v0.18.37/reconnecting_dialer.go:77","func":"github.com/openziti/channel.(*reconnectingDialer).Create","level":"debug","msg":"exited","time":"2022-05-31T04:29:35.805Z"}
{"error":"error connecting ctrl (context deadline exceeded)","file":"/home/kb/ziti/ziti-router/subcmd/run.go:119","func":"github.com/openziti/ziti/ziti-router/subcmd.run","level":"fatal","msg":"error starting","time":"2022-05-31T04:29:35.818Z"}

Here is the config.yml

v: 3
identity:
  cert: certs/client.cert.pem
  server_cert: certs/server.cert.pem
  key: certs/client.key.pem
  ca: certs/intermediate-chain.pem
ctrl:
  endpoint: tls:***redacted***.production.netfoundry.io:80
link:
  dialers:
  - binding: transport
healthChecks:
  ctrlPingCheck:
    interval: 30s
    timeout: 15s
    initialDelay: 15s
edge:
  csr:
    country: US
    province: NC
    locality: Charlotte
    organization: NetFoundry
    organizationalUnit: Ziti
    sans:
      ip:
      - 127.0.0.1
      - 192.168.0.128
listeners:
- binding: edge
  address: tls:0.0.0.0:443
  options:
    advertise: 192.168.0.128:443
- binding: tunnel
  options:
    lanIf: wlan0
    mode: host
web: []

That error message is due to the router not being able to connect to the controller for the control plane. That, by definition, is not REST (or resty related), but whatever transport you are using + the control protocol. The most common transport is TLS.

From your logs, it looks like the ctrl channel is on tls:50a2078f-50d2-4ae0-ab87-0a7c2359ec96.production.netfoundry.io:80 and that is not either resolving or reachable.

kb@kpizero1:~/ziti-router $ openssl s_client -connect 50a2078f-50d2-4ae0-ab87-0a7c2359ec96.production.netfoundry.io:80 | openssl x509 -noout -subject
depth=2 C = US, ST = NC, L = Charlotte, O = NetFoundry, CN = NetFoundry, emailAddress = support@netfoundry.io
verify error:num=19:self signed certificate in certificate chain
verify return:1
depth=2 C = US, ST = NC, L = Charlotte, O = NetFoundry, CN = NetFoundry, emailAddress = support@netfoundry.io
verify return:1
depth=1 C = US, ST = NC, O = NetFoundry, CN = Ziti Controller Intermediate CA, emailAddress = support@netfoundry.io
verify return:1
depth=0 C = US, ST = NC, L = Charlotte, O = NetFoundry, OU = AdvDev, CN = 129.213.156.210
verify return:1
subject=C = US, ST = NC, L = Charlotte, O = NetFoundry, OU = AdvDev, CN = 129.213.156.210
3064668224:error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate:../ssl/record/rec_layer_s3.c:1543:SSL alert number 42

I can still reproduce the same "context deadline exceeded" error despite the TLS server being responsive.

Odd, that is a very fast timeout (what go reports as a context deadline). Looking at your logs I see the channel starting to connect at 2022-05-31T04:29:35.805Z then deadlining (timing out) at 2022-05-31T04:29:35.818Z.

13ms timeout seems like a misleading error message

If I had to bet, it would be a configuration issue either with the host or the config file. I don’t know what else to suggest off the top of my head.

These are the relevant lines, it's timing out after about a second. The default connect timeout is 1s, so that makes sense.

You can change the connect timeout by doing something like:

ctrl:
    endpoint: <your controller address>
    options:
        connectTimeoutMs: 10000 # 10 seconds

Thanks to you both for the insights. With that longer deadline I get a different error after approx 4s from starting ziti-router run.

These last few lines of the output seem to be most relevant.

{"_context":"tls:50a2078f-50d2-4ae0-ab87-0a7c2359ec96.production.netfoundry.io:80","file":"/home/kb/go/pkg/mod/github.com/openziti/channel@v0.18.37/reconnecting_dialer.go:68","func":"github.com/openziti/channel.(*reconnectingDialer).Create","level":"debug","msg":"started","time":"2022-05-31T21:53:55.013Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/edge@v0.21.222/router/xgress_edge/accept.go:108","func":"github.com/openziti/edge/router/xgress_edge.(*Acceptor).Run","level":"info","msg":"starting","time":"2022-05-31T21:53:54.995Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/transport/v2@v2.0.2/tls/dialer.go:61","func":"github.com/openziti/transport/v2/tls.DialWithLocalBinding","level":"debug","msg":"server provided [3] certificates","time":"2022-05-31T21:53:56.299Z"}
{"_context":"u{reconnecting}-\u003ei{}","file":"/home/kb/go/pkg/mod/github.com/openziti/channel@v0.18.37/reconnecting_dialer.go:133","func":"github.com/openziti/channel.(*reconnectingDialer).sendHello","level":"debug","msg":"started","time":"2022-05-31T21:53:56.305Z"}
{"_context":"u{reconnecting}-\u003ei{}","file":"/home/kb/go/pkg/mod/github.com/openziti/channel@v0.18.37/reconnecting_dialer.go:148","func":"github.com/openziti/channel.(*reconnectingDialer).sendHello","level":"debug","msg":"exited","time":"2022-05-31T21:53:56.311Z"}
{"_context":"tls:50a2078f-50d2-4ae0-ab87-0a7c2359ec96.production.netfoundry.io:80","file":"/home/kb/go/pkg/mod/github.com/openziti/channel@v0.18.37/reconnecting_dialer.go:88","func":"github.com/openziti/channel.(*reconnectingDialer).Create","level":"debug","msg":"exited","time":"2022-05-31T21:53:56.328Z"}
{"error":"error connecting ctrl (EOF)","file":"/home/kb/ziti/ziti-router/subcmd/run.go:119","func":"github.com/openziti/ziti/ziti-router/subcmd.run","level":"fatal","msg":"error starting","time":"2022-05-31T21:53:56.339Z"}

Here’s the updated, full config.yml

v: 3
identity:
  cert: certs/client.cert.pem
  server_cert: certs/server.cert.pem
  key: certs/client.key.pem
  ca: certs/intermediate-chain.pem
ctrl:
  endpoint: tls:50a2078f-50d2-4ae0-ab87-0a7c2359ec96.production.netfoundry.io:80
  options:
    connectTimeoutMs: 10000 # 10 seconds
link:
  dialers:
  - binding: transport
healthChecks:
  ctrlPingCheck:
    interval: 30s
    timeout: 15s
    initialDelay: 15s
edge:
  csr:
    country: US
    province: NC
    locality: Charlotte
    organization: NetFoundry
    organizationalUnit: Ziti
    sans:
      ip:
      - 127.0.0.1
      - 192.168.0.128
listeners:
- binding: edge
  address: tls:0.0.0.0:443
  options:
    advertise: 192.168.0.128:443
- binding: tunnel
  options:
    lanIf: wlan0
    mode: host
web: []

Here’s the full startup output

kb@kpizero1:~/ziti-router $ sudo ./ziti-router run ./config.yml --verbose |& tee /tmp/ziti-router-raspi-zero-fatal-startup.debug
{"file":"/home/kb/ziti/ziti/cmd/ziti/util/updates.go:48","func":"github.com/openziti/ziti/ziti/cmd/ziti/util.LogReleaseVersionCheck","level":"debug","msg":"this build of ziti-router is unreleased v0.0.0","time":"2022-05-31T21:53:54.737Z"}
{"arch":"arm","build-date":"2020-01-01 01:01:01","configFile":"./config.yml","file":"/home/kb/ziti/ziti-router/subcmd/run.go:75","func":"github.com/openziti/ziti/ziti-router/subcmd.run","go-version":"go1.18.2","level":"info","msg":"starting ziti-router","os":"linux","revision":"local","routerId":"MoXZ8iOp3Y","time":"2022-05-31T21:53:54.804Z","version":"v0.0.0"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/router/forwarder/faulter.go:64","func":"github.com/openziti/fabric/router/forwarder.(*Faulter).run","level":"info","msg":"started","time":"2022-05-31T21:53:54.812Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/edge@v0.21.222/router/internal/edgerouter/config.go:139","func":"github.com/openziti/edge/router/internal/edgerouter.(*Config).LoadConfigFromMap","level":"warning","msg":"Invalid heartbeat interval [0] (min: 60, max: 10), setting to default [60]","time":"2022-05-31T21:53:54.823Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/router/router.go:266","func":"github.com/openziti/fabric/router.(*Router).showOptions","level":"info","msg":"ctrl = {\"OutQueueSize\":4,\"MaxQueuedConnects\":1,\"MaxOutstandingConnects\":16,\"ConnectTimeout\":60000000000,\"DelayRxStart\":false,\"WriteTimeout\":0}","time":"2022-05-31T21:53:54.837Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/router/router.go:272","func":"github.com/openziti/fabric/router.(*Router).showOptions","level":"info","msg":"metrics = {\"ReportInterval\":60000000000,\"MessageQueueSize\":10}","time":"2022-05-31T21:53:54.851Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/edge@v0.21.222/router/internal/apiproxy/proxy.go:45","func":"github.com/openziti/edge/router/internal/apiproxy.Start","level":"debug","msg":"API Proxy disabled","time":"2022-05-31T21:53:54.856Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/router/router.go:471","func":"github.com/openziti/fabric/router.(*Router).initializeHealthChecks","level":"info","msg":"starting health check with ctrl ping initially after 15s, then every 30s, timing out after 15s","time":"2022-05-31T21:53:54.874Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/router/forwarder/scanner.go:58","func":"github.com/openziti/fabric/router/forwarder.(*Scanner).run","level":"info","msg":"started","time":"2022-05-31T21:53:54.848Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/xweb/registry.go:45","func":"github.com/openziti/fabric/xweb.WebHandlerFactoryRegistryImpl.Add","level":"debug","msg":"adding xweb factory with binding: health-checks","time":"2022-05-31T21:53:54.891Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/router/router.go:350","func":"github.com/openziti/fabric/router.(*Router).startXlinkDialers","level":"info","msg":"started Xlink dialer with binding [transport]","time":"2022-05-31T21:53:54.918Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/edge@v0.21.222/router/xgress_edge/factory.go:168","func":"github.com/openziti/edge/router/xgress_edge.(*Factory).CreateListener","level":"debug","msg":"xgress edge listener options: mtu=65536\nrandomDrops=false\ndrop1InN=100\ntxQueueSize=1\ntxPortalStartSize=16384\ntxPortalMaxSize=4194304\ntxPortalMinSize=16384\ntxPortalIncreaseThresh=224\ntxPortalIncreaseScale=1\ntxPortalRetxThresh=64\ntxPortalRetxScale=0.75\ntxPortalDupAckThresh=64\ntxPortalDupAckScale=0.9\nrxBufferSize=4194304\nretxStartMs=200\nretxScale=1.5\nretxAddMs=0\nmaxCloseWait=30s\ngetCircuitTimeout=30s\nlookupApiSessionTimeout=5s\nlookupSessionTimeout=5s\nchannel.outQueueSize=4\nchannel.connectTimeout=1s\nchannel.maxOutstandingConnects=16\nchannel.maxQueuedConnects=1\n","time":"2022-05-31T21:53:54.936Z"}
{"address":{},"file":"/home/kb/go/pkg/mod/github.com/openziti/edge@v0.21.222/router/xgress_edge/listener.go:69","func":"github.com/openziti/edge/router/xgress_edge.(*listener).Listen","level":"info","msg":"starting channel listener","time":"2022-05-31T21:53:54.955Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/metrics/pool_metrics.go:50","func":"github.com/openziti/fabric/metrics.GoroutinesPoolMetricsConfigF.func1.1","idleTime":10000000000,"level":"info","maxQueueSize":1,"maxWorkers":16,"minWorkers":1,"msg":"starting goroutine pool","poolType":"pool.listener.xgress_edge","time":"2022-05-31T21:53:54.966Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/router/router.go:399","func":"github.com/openziti/fabric/router.(*Router).startXgressListeners","level":"info","msg":"created xgress listener [edge] at [tls:0.0.0.0:443]","time":"2022-05-31T21:53:54.987Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/fabric@v0.17.137/router/router.go:399","func":"github.com/openziti/fabric/router.(*Router).startXgressListeners","level":"info","msg":"created xgress listener [tunnel] at []","time":"2022-05-31T21:53:55.000Z"}
{"_context":"tls:50a2078f-50d2-4ae0-ab87-0a7c2359ec96.production.netfoundry.io:80","file":"/home/kb/go/pkg/mod/github.com/openziti/channel@v0.18.37/reconnecting_dialer.go:68","func":"github.com/openziti/channel.(*reconnectingDialer).Create","level":"debug","msg":"started","time":"2022-05-31T21:53:55.013Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/edge@v0.21.222/router/xgress_edge/accept.go:108","func":"github.com/openziti/edge/router/xgress_edge.(*Acceptor).Run","level":"info","msg":"starting","time":"2022-05-31T21:53:54.995Z"}
{"file":"/home/kb/go/pkg/mod/github.com/openziti/transport/v2@v2.0.2/tls/dialer.go:61","func":"github.com/openziti/transport/v2/tls.DialWithLocalBinding","level":"debug","msg":"server provided [3] certificates","time":"2022-05-31T21:53:56.299Z"}
{"_context":"u{reconnecting}-\u003ei{}","file":"/home/kb/go/pkg/mod/github.com/openziti/channel@v0.18.37/reconnecting_dialer.go:133","func":"github.com/openziti/channel.(*reconnectingDialer).sendHello","level":"debug","msg":"started","time":"2022-05-31T21:53:56.305Z"}
{"_context":"u{reconnecting}-\u003ei{}","file":"/home/kb/go/pkg/mod/github.com/openziti/channel@v0.18.37/reconnecting_dialer.go:148","func":"github.com/openziti/channel.(*reconnectingDialer).sendHello","level":"debug","msg":"exited","time":"2022-05-31T21:53:56.311Z"}
{"_context":"tls:50a2078f-50d2-4ae0-ab87-0a7c2359ec96.production.netfoundry.io:80","file":"/home/kb/go/pkg/mod/github.com/openziti/channel@v0.18.37/reconnecting_dialer.go:88","func":"github.com/openziti/channel.(*reconnectingDialer).Create","level":"debug","msg":"exited","time":"2022-05-31T21:53:56.328Z"}
{"error":"error connecting ctrl (EOF)","file":"/home/kb/ziti/ziti-router/subcmd/run.go:119","func":"github.com/openziti/ziti/ziti-router/subcmd.run","level":"fatal","msg":"error starting","time":"2022-05-31T21:53:56.339Z"}
kb@kpizero1:~/ziti-router $ tree
.
├── certs
│   ├── client.cert.pem
│   ├── client.key.pem
│   ├── intermediate-chain.pem
│   └── server.cert.pem
├── config.yml
├── raspi-zero-router4.yml.j2
└── ziti-router

“error connecting ctrl (EOF)” in my experience usually means the certs are not valid. did you perhaps delete the router and recreate it without reenrolling it?

Not, in this case, no. Same certs, same controller as prior attempts. The only difference was that I took @plorenz 's advice to allow 10s instead of 1s for control channel TLS. I get the same result allowing 2s, and so I surmise this (puny) device needs slightly more than 1s for that step.

I’d check in the controller log to see what it’s reporting. It should tell you if it recognizes the router.

Here’s the controller logs following what appears to be yet another failure mode for ziti-router run for routerId: MPyreGPG3Y connecting from IPv4 69.234.69.53.

kb@kpizero1:~/ziti-router $ sudo ./ziti-router run ./config.yml --verbose 
[   1.376]   DEBUG ziti/ziti/cmd/ziti/util.LogReleaseVersionCheck: this build of ziti-router is unreleased v0.0.0
[   1.435]    INFO ziti/ziti-router/subcmd.run: {routerId=[MoXZ8iOp3Y] os=[linux] arch=[arm] build-date=[2020-01-01 01:01:01] version=[v0.0.0] configFile=[./config.yml] revision=[local] go-version=[go1.18.2]} starting ziti-router
[   1.446] WARNING edge/router/internal/edgerouter.(*Config).LoadConfigFromMap: Invalid heartbeat interval [0] (min: 60, max: 10), setting to default [60]
[   1.466]    INFO fabric/router/forwarder.(*Faulter).run: started
[   1.473]    INFO fabric/router/forwarder.(*Scanner).run: started
[   1.480]   DEBUG edge/router/internal/apiproxy.Start: API Proxy disabled
[   1.493]    INFO fabric/router.(*Router).showOptions: ctrl = {"OutQueueSize":4,"MaxQueuedConnects":1,"MaxOutstandingConnects":16,"ConnectTimeout":4000000000,"DelayRxStart":false,"WriteTimeout":0}
[   1.499]    INFO fabric/router.(*Router).showOptions: metrics = {"ReportInterval":60000000000,"MessageQueueSize":10}
[   1.505]    INFO fabric/router.(*Router).initializeHealthChecks: starting health check with ctrl ping initially after 15s, then every 30s, timing out after 15s
[   1.516]   DEBUG fabric/xweb.WebHandlerFactoryRegistryImpl.Add: adding xweb factory with binding: health-checks
[   1.530]    INFO fabric/router.(*Router).startXlinkDialers: started Xlink dialer with binding [transport]
[   1.541]   DEBUG edge/router/xgress_edge.(*Factory).CreateListener: xgress edge listener options: mtu=65536
randomDrops=false
drop1InN=100
txQueueSize=1
txPortalStartSize=16384
txPortalMaxSize=4194304
txPortalMinSize=16384
txPortalIncreaseThresh=224
txPortalIncreaseScale=1
txPortalRetxThresh=64
txPortalRetxScale=0.75
txPortalDupAckThresh=64
txPortalDupAckScale=0.9
rxBufferSize=4194304
retxStartMs=200
retxScale=1.5
retxAddMs=0
maxCloseWait=30s
getCircuitTimeout=30s
lookupApiSessionTimeout=5s
lookupSessionTimeout=5s
channel.outQueueSize=4
channel.connectTimeout=1s
channel.maxOutstandingConnects=16
channel.maxQueuedConnects=1

[   1.572]    INFO edge/router/xgress_edge.(*listener).Listen: {address=[tls:0.0.0.0:443]} starting channel listener
[   1.582]    INFO fabric/metrics.GoroutinesPoolMetricsConfigF.func1.1: {poolType=[pool.listener.xgress_edge] minWorkers=[1] maxWorkers=[16] idleTime=[10s] maxQueueSize=[1]} starting goroutine pool
[   1.608]    INFO fabric/router.(*Router).startXgressListeners: created xgress listener [edge] at [tls:0.0.0.0:443]
[   1.620]    INFO fabric/router.(*Router).startXgressListeners: created xgress listener [tunnel] at []
[   1.630]   DEBUG channel.(*reconnectingDialer).Create [tls:50a2078f-50d2-4ae0-ab87-0a7c2359ec96.production.netfoundry.io:80]: started
[   1.642]    INFO edge/router/xgress_edge.(*Acceptor).Run: starting
[   3.005]   DEBUG transport/v2/tls.DialWithLocalBinding: server provided [3] certificates
[   3.017]   DEBUG channel.(*reconnectingDialer).sendHello [u{reconnecting}->i{}]: started
[   3.036]   DEBUG channel.(*reconnectingDialer).sendHello [u{reconnecting}->i{}]: exited
[   3.046]   DEBUG channel.(*reconnectingDialer).Create [tls:50a2078f-50d2-4ae0-ab87-0a7c2359ec96.production.netfoundry.io:80]: exited
[   3.057]   FATAL ziti/ziti-router/subcmd.run: {error=[error connecting ctrl (write tcp 192.168.0.128:56454->129.213.156.210:80: write: broken pipe)]} error starting

Pretty-printed a few interesting lines from the controller log:

{
  "context": "tls:0.0.0.0:80",
  "file": "github.com/openziti/channel@v0.18.14/classic_listener.go:157",
  "func": "github.com/openziti/channel.(*classicListener).listener",
  "level": "error",
  "msg": "error receiving hello from [tls:69.234.69.53:56444] (receive error (read tcp 10.19.56.100:80->69.234.69.53:56444: i/o timeout))",
  "time": "2022-06-01T14:22:36.221Z"
}
{
  "level": "info",
  "msg": "http: TLS handshake error from 69.234.69.53:57512: read tcp 10.19.56.100:443->69.234.69.53:57512: i/o timeout",
  "time": "2022-06-01T14:22:40.168Z"
}
{
  "file": "github.com/openziti/fabric@v0.17.62/controller/handler_ctrl/circuit_confirmation.go:42",
  "func": "github.com/openziti/fabric/controller/handler_ctrl.(*circuitConfirmationHandler).HandleReceive",
  "level": "info",
  "msg": "received circuit confirmation request",
  "routerId": "MPyreGPG3Y",
  "time": "2022-06-01T14:23:13.515Z"
}
{
  "context": "tls:0.0.0.0:80",
  "file": "github.com/openziti/channel@v0.18.14/classic_listener.go:157",
  "func": "github.com/openziti/channel.(*classicListener).listener",
  "level": "error",
  "msg": "error receiving hello from [tls:69.234.69.53:56446] (receive error (read tcp 10.19.56.100:80->69.234.69.53:56446: i/o timeout))",
  "time": "2022-06-01T14:23:14.334Z"
}

Controller log lines with timestampts and some irrelevant lines removed because the routerId was for a transit-router, not the edge-router in focus.

Jun 01 14:22:36 nc-kencloud1-kencloud-1652209281 ziti-controller[12874]: {"context":"tls:0.0.0.0:80","file":"github.com/openziti/channel@v0.18.14/classic_listener.go:157","func":"github.com/openziti/channel.(*classicListener).listener","level":"error","msg":"error receiving hello from [tls:69.234.69.53:56444] (receive error (read tcp 10.19.56.100:80-\u003e69.234.69.53:56444: i/o timeout))","time":"2022-06-01T14:22:36.221Z"}
Jun 01 14:22:40 nc-kencloud1-kencloud-1652209281 ziti-controller[12874]: {"level":"info","msg":"http: TLS handshake error from 69.234.69.53:57512: read tcp 10.19.56.100:443-\u003e69.234.69.53:57512: i/o timeout","time":"2022-06-01T14:22:40.168Z"}
Jun 01 14:23:13 nc-kencloud1-kencloud-1652209281 ziti-controller[12874]: {"file":"github.com/openziti/fabric@v0.17.62/controller/handler_ctrl/circuit_confirmation.go:42","func":"github.com/openziti/fabric/controller/handler_ctrl.(*circuitConfirmationHandler).HandleReceive","level":"info","msg":"received circuit confirmation request","routerId":"MPyreGPG3Y","time":"2022-06-01T14:23:13.515Z"}
Jun 01 14:23:14 nc-kencloud1-kencloud-1652209281 ziti-controller[12874]: {"context":"tls:0.0.0.0:80","file":"github.com/openziti/channel@v0.18.14/classic_listener.go:157","func":"github.com/openziti/channel.(*classicListener).listener","level":"error","msg":"error receiving hello from [tls:69.234.69.53:56446] (receive error (read tcp 10.19.56.100:80-\u003e69.234.69.53:56446: i/o timeout))","time":"2022-06-01T14:23:14.334Z"}

I verified that controller IP is still available on 80/tcp for TLS negotiation.

kb@kpizero1:~/ziti-router $ openssl s_client -connect 129.213.156.210:80 2>/dev/null | openssl x509 -noout -subject
subject=C = US, ST = NC, L = Charlotte, O = NetFoundry, OU = AdvDev, CN = 129.213.156.210

@plorenz @andrew.martinez Hypothesis: the controller too has sane timeouts by default, and this device is too slow.

If so, could I raise those limits to create headroom for a slower device?

Perhaps this xgress edge listener property that appears in the verbose run output could be configured somehow in the config.yml, channel.connectTimeout=1s?

This is a repeat of the last few lines of the startup output from ziti-router run. I’m duplicating this here to bring attention to the noticeable pause that occurs at this INFO message. It seems like this represents the step that’s failing. The following lines print rapidly after the pause at the first INFO message.

[   1.591]    INFO edge/router/xgress_edge.(*Acceptor).Run: starting
[   2.939]   DEBUG transport/v2/tls.DialWithLocalBinding: server provided [3] certificates
[   2.946]   DEBUG channel.(*reconnectingDialer).sendHello [u{reconnecting}->i{}]: started
[   2.951]   DEBUG channel.(*reconnectingDialer).sendHello [u{reconnecting}->i{}]: exited
[   2.958]   DEBUG channel.(*reconnectingDialer).Create [tls:50a2078f-50d2-4ae0-ab87-0a7c2359ec96.production.netfoundry.io:80]: exited
[   2.962]   FATAL ziti/ziti-router/subcmd.run: {error=[error connecting ctrl (EOF)]} error starting