Controller upgrade issue

I upgraded controller from v0.26.5 to v0.26.8 ( did export ZITI_VERSION=v0.26.8 and ran getZiti function ) and something went wrong. So I went back to v0.26.5 and I still got the same stack trace. Am I missing anything?

Here is the stack trace

$ "${ZITI_BIN_DIR-}/ziti-controller" run "${ZITI_HOME_OS_SPECIFIC}/${ZITI_EDGE_CONTROLLER_RAWNAME}.yaml"

*********************************************************************************

An update with v0.26.8 is available for ziti-controller v0.26.5 from
https://github.com/openziti/ziti/releases/latest/

*********************************************************************************
[   0.521]    INFO ziti/ziti-controller/subcmd.run: {arch=[amd64] build-date=[2022-08-17T14:01:21Z] revision=[e5d4b19e3568] version=[v0.26.5] go-version=[go1.18.5] os=[linux] nodeId=[athenz.experiments.openziti-controller]} starting ziti-controller
[   0.521]    INFO fabric/events.(*Dispatcher).eventLoop: event dispatcher: started
[   0.521]    INFO storage/boltz.(*migrationManager).Migrate.func1: fabric datastore is up to date at version 5
[   0.524]    INFO fabric/controller/network.(*Network).showOptions: network = {
  "CycleSeconds": 60,
  "Smart": {
    "RerouteFraction": 0.02,
    "RerouteCap": 4
  },
  "RouteTimeout": 10000000000,
  "CreateCircuitRetries": 2,
  "CtrlChanLatencyInterval": 10000000000,
  "PendingLinkTimeout": 10000000000,
  "MinRouterCost": 10,
  "RouterConnectChurnLimit": 60000000000,
  "InitialLinkLatency": 65000000000,
  "MetricsReportInterval": 60000000000
}
[   0.524]    INFO fabric/controller.(*Controller).showOptions: ctrl = {
  "OutQueueSize": 4,
  "MaxQueuedConnects": 1,
  "MaxOutstandingConnects": 16,
  "ConnectTimeout": 1000000000,
  "DelayRxStart": false,
  "WriteTimeout": 0,
  "NewListener": null
}
[   1.162]    INFO edge/controller/server.NewController: edge controller instance id: cl8mk84kr00006pjvktqp1aov
[   1.162]    INFO edge/controller/server.(*Controller).Initialize: initializing edge
[   1.166]    INFO storage/boltz.(*migrationManager).Migrate.func1: edge datastore is up to date at version 29
[   1.169]    INFO edge/controller/internal/policy.NewSessionEnforcer: {sessionTimeout=[30m0s] frequency=[5s]} session enforcer configured
[   1.169]    INFO edge/controller/server.(*Controller).Run: starting edge
[   1.173]    INFO fabric/metrics.GoroutinesPoolMetricsConfigF.func1.1: {maxWorkers=[16] idleTime=[10s] maxQueueSize=[1] poolType=[pool.listener.ctrl] minWorkers=[1]} starting goroutine pool
[   1.173]    INFO fabric/controller/handler_ctrl.(*CtrlAccepter).Run: started
[   1.174]    INFO fabric/controller/network.(*Network).Run: started
[   1.299]    INFO xweb/v2.(*Server).Start: starting ApiConfig to listen and serve tls on 0.0.0.0:8262 for server client-management with APIs: [edge-management edge-client fabric]
[   2.843]    INFO fabric/controller/handler_ctrl.(*bindHandler).BindChannel: {routerId=[HZSpDtw90] routerVersion=[v0.26.5]} router supports heartbeats
[   2.843]    INFO fabric/controller/handler_ctrl.(*CtrlAccepter).Bind: {routerId=[HZSpDtw90]} accepted new router connection [r/HZSpDtw90]
[   2.844] WARNING channel.(*channelImpl).rxer [ch{HZSpDtw90}->u{classic}->i{y5xY}]: dropped message. type [-33], sequence [-1], replyFor [-1]
[   2.852]    INFO edge/controller/env.(*Broker).RouterConnected.func1: {routerId=[HZSpDtw90] routerFingerprint=[0xc001d04480] routerName=[er1]} broker detected edge router with id HZSpDtw90 connecting
[   2.852]    INFO edge/controller/sync_strats.(*InstantStrategy).RouterConnected: {routerName=[er1] routerFingerprint=[0xc001d04480] sync_strategy=[instant] syncStatus=[SYNC_QUEUED] routerId=[HZSpDtw90]} edge router connected, adding to sync routerConnectedQueue
[   2.852]    INFO edge/controller/sync_strats.(*InstantStrategy).hello: {routerChannelIsOpen=[true] routerId=[HZSpDtw90] routerName=[er1] routerFingerprint=[0xc001d04480] routerTxId=[-ew0A777Y] strategy=[instant]} edge router sync starting
[   2.852]    INFO edge/controller/sync_strats.(*InstantStrategy).hello: {routerFingerprint=[0xc001d04480] routerTxId=[-ew0A777Y] strategy=[instant] routerChannelIsOpen=[true] routerId=[HZSpDtw90] routerName=[er1] syncStatus=[SYNC_HELLO]} sending edge router hello
[   2.853]    INFO edge/controller/sync_strats.(*InstantStrategy).ReceiveClientHello: {listeners=[[address:{value:"tls:0.0.0.0:8888"  protocol:"tls"  hostname:"0.0.0.0"  port:8888}  advertise:{value:"er1.example.com:8888"  protocol:"tls"  hostname:"er1.example.com"  port:8888}]] routerId=[HZSpDtw90] os=[linux] protocolPorts=[[8888]] buildDate=[2022-08-17T14:01:21Z] arch=[amd64] revision=[e5d4b19e3568] version=[v0.26.5] routerChannelIsOpen=[true] routerTxId=[-ew0A777Y] protocols=[[tls]] routerFingerprint=[0xc001d04480] strategy=[instant] routerName=[er1] data=[map[]]} edge router sent hello with version [v0.26.5] to controller with version [v0.26.5]
[   2.853]    INFO edge/controller/sync_strats.(*InstantStrategy).synchronize: {routerTxId=[-ew0A777Y] routerId=[HZSpDtw90] routerName=[er1] routerFingerprint=[0xc001d04480] routerChannelIsOpen=[true] strategy=[instant]} started synchronizing edge router
[   2.854]   ERROR fabric/controller/handler_ctrl.(*routerLinkHandler).HandleLinks [ch{HZSpDtw90}->u{classic}->i{y5xY}]: {routerId=[HZSpDtw90] linkId=[18QrDqqxAqYFkdPY6mJC2p] destRouterId=[lAJSZUj5SG] error=[destination router not connected]} unexpected error adding router reported link
[   2.854]   ERROR fabric/controller/handler_ctrl.(*routerLinkHandler).HandleLinks [ch{HZSpDtw90}->u{classic}->i{y5xY}]: {error=[destination router not connected] routerId=[HZSpDtw90] linkId=[2yXnn9clLvRIwoizTDYtNr] destRouterId=[Kp5Sm465SG]} unexpected error adding router reported link
[   2.870]    INFO fabric/controller/handler_ctrl.(*bindHandler).BindChannel: {routerId=[Kp5Sm465SG] routerVersion=[v0.26.5]} router supports heartbeats
[   2.870]    INFO fabric/controller/handler_ctrl.(*CtrlAccepter).Bind: {routerId=[Kp5Sm465SG]} accepted new router connection [r/Kp5Sm465SG]
[   2.871] WARNING channel.(*channelImpl).rxer [ch{Kp5Sm465SG}->u{classic}->i{vnpy}]: dropped message. type [-33], sequence [-1], replyFor [-1]
[   2.874]    INFO edge/controller/env.(*Broker).RouterConnected.func1: {routerFingerprint=[0xc00328d780] routerId=[Kp5Sm465SG] routerName=[er2]} broker detected edge router with id Kp5Sm465SG connecting
[   2.874]    INFO edge/controller/sync_strats.(*InstantStrategy).RouterConnected: {sync_strategy=[instant] syncStatus=[SYNC_QUEUED] routerId=[Kp5Sm465SG] routerName=[er2] routerFingerprint=[0xc00328d780]} edge router connected, adding to sync routerConnectedQueue
[   2.874]    INFO edge/controller/sync_strats.(*InstantStrategy).hello: {routerName=[er2] routerFingerprint=[0xc00328d780] routerTxId=[EeR00777p] routerChannelIsOpen=[true] strategy=[instant] routerId=[Kp5Sm465SG]} edge router sync starting
[   2.874]    INFO edge/controller/sync_strats.(*InstantStrategy).hello: {routerTxId=[EeR00777p] routerChannelIsOpen=[true] strategy=[instant] routerId=[Kp5Sm465SG] routerName=[er2] routerFingerprint=[0xc00328d780] syncStatus=[SYNC_HELLO]} sending edge router hello
[   2.883]    INFO edge/controller/sync_strats.(*InstantStrategy).ReceiveClientHello: {routerName=[er2] version=[v0.26.5] buildDate=[2022-08-17T14:01:21Z] arch=[amd64] revision=[e5d4b19e3568] routerTxId=[EeR00777p] protocols=[[tls]] routerChannelIsOpen=[true] data=[map[]] protocolPorts=[[8888]] listeners=[[address:{value:"tls:0.0.0.0:8888"  protocol:"tls"  hostname:"0.0.0.0"  port:8888}  advertise:{value:"er2.example.com:8888"  protocol:"tls"  hostname:"er2.example.com"  port:8888}]] strategy=[instant] routerId=[Kp5Sm465SG] routerFingerprint=[0xc00328d780] os=[linux]} edge router sent hello with version [v0.26.5] to controller with version [v0.26.5]
[   2.883]    INFO edge/controller/sync_strats.(*InstantStrategy).synchronize: {routerFingerprint=[0xc00328d780] routerName=[er2] strategy=[instant] routerTxId=[EeR00777p] routerId=[Kp5Sm465SG] routerChannelIsOpen=[true]} started synchronizing edge router
[   2.898]    INFO fabric/controller/handler_ctrl.(*routerLinkHandler).HandleLinks [ch{Kp5Sm465SG}->u{classic}->i{vnpy}]: {routerId=[Kp5Sm465SG] linkId=[2yXnn9clLvRIwoizTDYtNr] destRouterId=[HZSpDtw90]} router reported link added
[   2.898]   ERROR fabric/controller/handler_ctrl.(*routerLinkHandler).HandleLinks [ch{Kp5Sm465SG}->u{classic}->i{vnpy}]: {routerId=[Kp5Sm465SG] linkId=[swEnD8pHrXclpNdHNw5ZX] destRouterId=[lAJSZUj5SG] error=[destination router not connected]} unexpected error adding router reported link
[   2.920]    INFO edge/controller/sync_strats.(*InstantStrategy).synchronize.func1: {routerFingerprint=[0xc00328d780] routerTxId=[EeR00777p] routerId=[Kp5Sm465SG] routerName=[er2] routerChannelIsOpen=[true] strategy=[instant]} exiting synchronization, final status: SYNC_DONE
[   2.928]    INFO edge/controller/sync_strats.(*InstantStrategy).synchronize.func1: {routerId=[HZSpDtw90] routerTxId=[-ew0A777Y] strategy=[instant] routerName=[er1] routerFingerprint=[0xc001d04480] routerChannelIsOpen=[true]} exiting synchronization, final status: SYNC_DONE
[   2.940]   ERROR fabric/controller/api.(*timeoutHandler).ServeHTTP.func1.1: panic caught by timeout next: runtime error: invalid memory address or nil pointer dereference
goroutine 113 [running]:
github.com/openziti/foundation/v2/debugz.generateStack(0x2000, 0xed?)
	github.com/openziti/foundation/v2@v2.0.2/debugz/stack.go:38 +0x4a
github.com/openziti/foundation/v2/debugz.GenerateLocalStack(...)
	github.com/openziti/foundation/v2@v2.0.2/debugz/stack.go:33
github.com/openziti/fabric/controller/api.(*timeoutHandler).ServeHTTP.func1.1()
	github.com/openziti/fabric@v0.19.34/controller/api/timeouts.go:91 +0xfa
panic({0x2100160, 0x35af3d0})
	runtime/panic.go:838 +0x207
github.com/openziti/x509-claims/x509claims.(*ProviderBasic).Claims(...)
	github.com/openziti/x509-claims@v1.0.2/x509claims/cliam.go:71
github.com/openziti/edge/controller/model.(*Ca).GetExternalId(0xc0030d21a0, 0xc002aa3488?)
	github.com/openziti/edge@v0.22.54/controller/model/ca_model.go:246 +0x448
github.com/openziti/edge/controller/model.(*AuthModuleCert).Process(0xc001c68700, {0x281fdd8, 0xc00236cd00})
	github.com/openziti/edge@v0.22.54/controller/model/authenticator_mod_cert.go:152 +0x592
github.com/openziti/edge/controller/model.(*AuthenticatorManager).Authorize(0xc001afb800?, {0x281fdd8, 0xc00236cd00})
	github.com/openziti/edge@v0.22.54/controller/model/authenticator_manager.go:81 +0x82
github.com/openziti/edge/controller/internal/routes.(*AuthRouter).authHandler(0xc0004ae570, 0xc001c286c0, 0xc002b92540, 0x1bc0c74?, {0xc002968f69, 0x4}, 0xc002eea910)
	github.com/openziti/edge@v0.22.54/controller/internal/routes/authenticate_router.go:82 +0x17c
github.com/openziti/edge/controller/internal/routes.(*AuthRouter).Register.func1.1(0x98?, 0x280fd40?)
	github.com/openziti/edge@v0.22.54/controller/internal/routes/authenticate_router.go:58 +0x36
github.com/openziti/edge/controller/env.(*AppEnv).IsAllowed.func1({0x281d040, 0xc0020ccfc0}, {0x280fd40, 0x26395f8})
	github.com/openziti/edge@v0.22.54/controller/env/appenv.go:596 +0x38d
github.com/go-openapi/runtime/middleware.ResponderFunc.WriteResponse(0x20e5120?, {0x281d040?, 0xc0020ccfc0?}, {0x280fd40?, 0x26395f8?})
	github.com/go-openapi/runtime@v0.24.1/middleware/context.go:69 +0x3d
github.com/go-openapi/runtime/middleware.(*Context).Respond(0xc0017a93b0, {0x281d040?, 0xc0020ccfc0}, 0xc001afb800, {0xc0017356b0?, 0x1, 0x1}, 0xc001afb700, {0x211da60, 0xc0020cd140})
	github.com/go-openapi/runtime@v0.24.1/middleware/context.go:510 +0x59a
github.com/openziti/edge/rest_client_api_server/operations/authentication.(*Authenticate).ServeHTTP(0xc001381590, {0x281d040, 0xc0020ccfc0}, 0xc001afb800)
	github.com/openziti/edge@v0.22.54/rest_client_api_server/operations/authentication/authenticate.go:81 +0x1bf
github.com/go-openapi/runtime/middleware.NewOperationExecutor.func1({0x281d040, 0xc0020ccfc0}, 0xc001afb800)
	github.com/go-openapi/runtime@v0.24.1/middleware/operation.go:28 +0x59
net/http.HandlerFunc.ServeHTTP(0xc002d8fa48?, {0x281d040?, 0xc0020ccfc0?}, 0x0?)
	net/http/server.go:2084 +0x2f
github.com/go-openapi/runtime/middleware.NewRouter.func1({0x281d040, 0xc0020ccfc0}, 0xc001afb600)
	github.com/go-openapi/runtime@v0.24.1/middleware/router.go:78 +0x257
net/http.HandlerFunc.ServeHTTP(0xc002e8c468?, {0x281d040?, 0xc0020ccfc0?}, 0xc002d8fbc8?)
	net/http/server.go:2084 +0x2f
github.com/go-openapi/runtime/middleware.Redoc.func1({0x281d040, 0xc0020ccfc0}, 0xa50000c001c75f90?)
	github.com/go-openapi/runtime@v0.24.1/middleware/redoc.go:72 +0x242
net/http.HandlerFunc.ServeHTTP(0xc002d8fc90?, {0x281d040?, 0xc0020ccfc0?}, 0x201bc00?)
	net/http/server.go:2084 +0x2f
github.com/go-openapi/runtime/middleware.Spec.func1({0x281d040, 0xc0020ccfc0}, 0xc000a00c60?)
	github.com/go-openapi/runtime@v0.24.1/middleware/spec.go:46 +0x18c
net/http.HandlerFunc.ServeHTTP(0xc002b92540?, {0x281d040?, 0xc0020ccfc0?}, 0xc001afb600?)
	net/http/server.go:2084 +0x2f
github.com/openziti/edge/controller/server.ClientApiHandler.newHandler.func1({0x281d040, 0xc0020ccfc0}, 0xc001afb600)
	github.com/openziti/edge@v0.22.54/controller/server/client-api.go:193 +0x330
net/http.HandlerFunc.ServeHTTP(0xc000a00b70?, {0x281d040?, 0xc0020ccfc0?}, 0xb?)
	net/http/server.go:2084 +0x2f
github.com/gorilla/handlers.(*cors).ServeHTTP(0xc001db8bd0, {0x281d040, 0xc0020ccfc0}, 0xc001afb600)
	github.com/gorilla/handlers@v1.5.1/cors.go:54 +0x370
github.com/openziti/fabric/controller/api.(*timeoutHandler).ServeHTTP.func1()
	github.com/openziti/fabric@v0.19.34/controller/api/timeouts.go:95 +0x7c
created by github.com/openziti/fabric/controller/api.(*timeoutHandler).ServeHTTP
	github.com/openziti/fabric@v0.19.34/controller/api/timeouts.go:88 +0x430

[   3.093]    INFO fabric/controller/network.(*Network).Run: {routerId=[HZSpDtw90]} changed router
[   3.121]    INFO fabric/controller/network.(*Network).Run: {routerId=[Kp5Sm465SG]} changed router
[   3.782] WARNING fabric/controller/handler_ctrl.(*CtrlAccepter).Bind: {routerId=[lAJSZUj5SG]} no advertised listeners
[   3.782]    INFO fabric/controller/handler_ctrl.(*bindHandler).BindChannel: {routerVersion=[v0.26.5] routerId=[lAJSZUj5SG]} router supports heartbeats
[   3.782]    INFO fabric/controller/handler_ctrl.(*CtrlAccepter).Bind: {routerId=[lAJSZUj5SG]} accepted new router connection [r/lAJSZUj5SG]
[   3.783]    INFO edge/controller/env.(*Broker).RouterConnected.func1: {routerId=[lAJSZUj5SG] routerName=[corp-router1] routerFingerprint=[0xc0029f0d50]} broker detected edge router with id lAJSZUj5SG connecting
[   3.783]    INFO edge/controller/sync_strats.(*InstantStrategy).RouterConnected: {syncStatus=[SYNC_QUEUED] routerId=[lAJSZUj5SG] routerName=[corp-router1] routerFingerprint=[0xc0029f0d50] sync_strategy=[instant]} edge router connected, adding to sync routerConnectedQueue
[   3.783]    INFO edge/controller/sync_strats.(*InstantStrategy).hello: {routerChannelIsOpen=[true] routerTxId=[6EW0A77bY] routerId=[lAJSZUj5SG] routerName=[corp-router1] strategy=[instant] routerFingerprint=[0xc0029f0d50]} edge router sync starting
[   3.783]    INFO edge/controller/sync_strats.(*InstantStrategy).hello: {routerId=[lAJSZUj5SG] routerName=[corp-router1] syncStatus=[SYNC_HELLO] strategy=[instant] routerFingerprint=[0xc0029f0d50] routerChannelIsOpen=[true] routerTxId=[6EW0A77bY]} sending edge router hello
[   3.793] WARNING channel.(*channelImpl).rxer [ch{lAJSZUj5SG}->u{classic}->i{vExM}]: dropped message. type [-33], sequence [-1], replyFor [-1]
[   3.804]    INFO fabric/controller/handler_ctrl.(*routerLinkHandler).HandleLinks [ch{lAJSZUj5SG}->u{classic}->i{vExM}]: {destRouterId=[HZSpDtw90] routerId=[lAJSZUj5SG] linkId=[18QrDqqxAqYFkdPY6mJC2p]} router reported link added
[   3.804]    INFO fabric/controller/handler_ctrl.(*routerLinkHandler).HandleLinks [ch{lAJSZUj5SG}->u{classic}->i{vExM}]: {linkId=[swEnD8pHrXclpNdHNw5ZX] destRouterId=[Kp5Sm465SG] routerId=[lAJSZUj5SG]} router reported link added
[   3.804]    INFO edge/controller/sync_strats.(*InstantStrategy).ReceiveClientHello: {routerTxId=[6EW0A77bY] routerChannelIsOpen=[true] buildDate=[2022-08-17T14:01:21Z] version=[v0.26.5] routerName=[corp-router1] protocolPorts=[[443]] strategy=[instant] routerFingerprint=[0xc0029f0d50] os=[linux] arch=[amd64] protocols=[[tls]] listeners=[[address:{value:"tls:0.0.0.0:443"  protocol:"tls"  hostname:"0.0.0.0"  port:443}  advertise:{value:"er1:443"  protocol:"tls"  hostname:"er1"  port:443}]] revision=[e5d4b19e3568] routerId=[lAJSZUj5SG] data=[map[]]} edge router sent hello with version [v0.26.5] to controller with version [v0.26.5]
[   3.804]    INFO edge/controller/sync_strats.(*InstantStrategy).synchronize: {routerTxId=[6EW0A77bY] routerId=[lAJSZUj5SG] routerName=[corp-router1] routerFingerprint=[0xc0029f0d50] routerChannelIsOpen=[true] strategy=[instant]} started synchronizing edge router
[   3.820]    INFO edge/controller/sync_strats.(*InstantStrategy).synchronize.func1: {routerFingerprint=[0xc0029f0d50] routerId=[lAJSZUj5SG] routerChannelIsOpen=[true] strategy=[instant] routerName=[corp-router1] routerTxId=[6EW0A77bY]} exiting synchronization, final status: SYNC_DONE
[   4.033]    INFO fabric/controller/network.(*Network).Run: {routerId=[lAJSZUj5SG]} changed router
[   5.266]   ERROR fabric/controller/api.(*timeoutHandler).ServeHTTP.func1.1: panic caught by timeout next: runtime error: invalid memory address or nil pointer dereference

Up on closer look, upgrade might be a red herring and I am wondering if something went wrong with CA ( considering the error wrt externalId ) and went unnoticed until I tried the upgrade and watched the logs.

P.S couldn’t help but notice the typo in the name cliam.go :slight_smile:

Before upgrading, you probably should backup the database. If there’s anything that gets migrated, schema wise, you’ll want that undone when reverting back. That might be an issue?

I see the panic in your logs, but nothing clicks with me immediately.

In the past there have indeed been bugs around certificates expiring and a restart causing an issue but I haven’t seen one for a while. Could be related.

I’ll ping a couple others and see if this is a known issue or not.

I created this issue for you in the ziti repo for OpenZiti. I’ll be taking a look at it this morning.

PRs for the issue are moving up to the ziti repo now with fixes and new tests.

Thanks for the report!

Great. As always, appreciate the quick response!