"error":"no api session found for token... on routers for android app

Trying to migrate to HA I messed up my previous working configuration.

Short story, my second node, on different host, was misconfigured (same advertiseaddress then first node) so I disabled it and restored backup of first node (replaced full .ziti folder)

Since then, my Android app is unable to connect any more.
I deleted the app storage and created a new identity. Tried to enrol it and everything seem fine on my phone: I'm authenticated and I can see services.

On the controller the new identity is offline and has still the generic desktop icon

On every router I'm getting

 {"error":"no api session found for token [eyJhbGciOiJSUzI1NiIsImtpZCI6IjYzZmZkZGQ3MzBiY2VhMzM1NzQ1MzEyMGY5YTk0YTllZTczMThiYjMiLCJ0eXAiOiJKV1QifQ.eyJhdWQiOlsib3BlbnppdGkiXSwiZXhwIjoxNzM5NDQ3OTg0LCJpYXQiOjE3Mzk0NDYxODQsImlzcyI6Imh0dHBzOi8veml0aS5jaWN1Y2kuaXQ6NDQzL29pZGMiLCJqdGkiOiJjYTFiN2Y2My1jMTVhLTRhODctODFmNy0yMzc5YzhlYjUxZWMiLCJuYmYiOjE3Mzk0NDYxODQsInN1YiI6InJpNlVqSFhkcSIsInpfYWlkIjoib3BlbnppdGkiLCJ6X2FzaWQiOiI5NDI0YmM5Zi0xMTA2LTRjNjctYjk3MS01MDIxNzkxNDI5MzUiLCJ6X2NmcyI6WyI2NWM5NzRjNDQ3NjQwYWQyNjU1YTEwODdjZDk4NDc5ODRkOTllMDU2IiwiMzE0MWI5Njg0NDQyNmVmYjg0YjQ1NDdmMDUzNDQzZWFlMWUzY2VjMyIsIjgzNmFlMzlmOTdlOTYyOGQ4NzcxNDkzMTViNWEyYThkNDdiNTc1YjgiXSwiel9lbnYiOnt9LCJ6X2ljZSI6dHJ1ZSwiel9yYSI6IjkzLjM4LjI0OS42Mzo2MTQ0NSIsInpfc2RrIjp7fSwiel90IjoiYSJ9.Xv66i0OXOMTmrAjGlO5e29SF-M9bM4doY7B0QPXdnlFrr9N98FdEeQCJGSlEVCKSbbwQlhq150bQYrN8sl9mMjvDR7UKgP07oU1TaM3uISOfblAGtIssBw8jF3S4k6iBbpZMJ2A0nhM-LUmMPuLOpRWdIHDUc6HAlS_FHHYjFK77y10CrE1ZzArurb4JOqGXrwXCXxAYZkYPObE9xgbjMYvfRjBiqkNxD9aMQuRLFcKe63ycsmXotFokn2BSDmThUYZDwdxs69BKDEPad0YPhBJv54I7XLH1Fv8XsOP2NIoYuMWhHbMceZ76KZklOgUaXzL-QuJ89NSPIr22jzbNwiqBPdicUlBn7FKawTV89ESKRJoWypnhhjJh2T9pq9keEPaZW7Hoz2ZyPWoGsO6bCiwf7fV4Bf5mD6mVQrUR00cA6Exiv-Ch-nt2_sLW3FvN-YFxSDLnVSBlPKNCc-fk1piccTgRjuRBMYaM9R69p3ewgxaBBZOX5-Ua-QOln7kawUfqLaauOcpLqyIKBxePGS8-iH2-noviBoUKc7BJtWwo41LkAU0K_BYygVbuhlIaEKTShGrBnrZCdSDFFWhGlzSi83O3EHVrQUyG1J5CdvsE0sr-_pUzrDIdkIIisCIKAHr_ic8LoNQoDWSMBtrt8RUSxJ0nNOBHxuaeawjkCck], fingerprint: [65c974c447640ad2655a1087cd9847984d99e056], subjects [[CN=ri6UjHXdq,O=OpenZiti CN=cqchomeoverlay-signing-intermediate,OU=ADV-DEV,O=NetFoundry,L=Charlotte,C=US CN=cqchomeoverlay-signing-intermediate_grandparent_intermediate,OU=ADV-DEV,O=NetFoundry,L=Charlotte,C=US]]","file":"github.com/openziti/channel/v3@v3.0.5/impl.go:124","func":"github.com/openziti/channel/v3.AcceptNextChannel.func1","level":"error","msg":"failure accepting channel edge with underlay u{classic}-\u003ei{OlxV}","time":"2025-02-13T11:29:54.749Z"}

on the controller I get, during enrolment

Feb 13 11:42:20 cqchomeoverlay ziti[1088321]: {"file":"github.com/openziti/ziti/controller/raft/fsm.go:152","func":"github.com/openziti/ziti/controller/raft.(*BoltDbFsm).Apply","index":374,"level":"info","msg":"apply log with type *model.ReplaceEnrollmentWithAuthenticatorCmd","time":"2025-02-13T11:42:20.848Z"}

Tunnel identities on linux clients are working just fine.
I cannot check my windows client right now.

I also reinstalled from scratch one of my routers, with no difference.

Oh no!!! While I did warn you that you were blazing some new ground, I'm sorry you're hitting this snag! :frowning: Thank you for helping us test clustered controllers!

Is there any chance you added the Android app after upgrading the first controller? When you restored that database, the Android app would be 'forgotten' by the controller. That could explain the first issue, maybe.

Would you be willing to go to the menu and send the feedback logs to clint at openziti.org? I'll have a look at them. @ekoby - can you have a look at this and see what you think?

I tried to enrol a new test identity to my phone just after migrating to single node cluster and I don't remember any issue. I didn't try any tunnel trough the test identity, though.

After the failed 2nd node join, my Android app started to loop crash. Even after the backup restore, so I had to clean app data e re-enrol

I can still restore a backup prior to HA.

Restoring a pre HA backup solved the issue :sweat_smile:: a new android enrolled identity is working fine.
I need to restore by hand a couple of configurations and re-enrol 2 routers.

The backup of the system is easy and the restore quite fail proof :clap:

it appears that you're using fairly old versions, which could explain failure to join another controller:

  • ctrl[1.2.2]
  • gziti-router version[v1.1.15]
  • dockerdmz-local-router version[v1.1.15]
  • cqchomeoverlay-edge-router version[v1.2.2]

I think it is worth updating to latest release for all components before migrating to HA

I will try to proceed step by step, with intermediate backups:
-upgrade components
-migrate node 1 to HA
-configure and join node2

But not before next week.

Routers are installed from repo on debian. Is it possible to replace the excutable with the 1.3.3 release?

Yes. All of the components upgrade by simply replacing the binary and restarting them.

1.3.3 has been promoted to latest, so it should appear in repos shortly if not already

That is the correct expectation, but there was, unfortunately, a glitch that prevented promoting 1.3.3 when it was marked a stable release.

The problem was fixed, and the next Ziti release marked stable will be promoted to the Linux package repos.

The workaround, if 1.3.3 is needed sooner than the next stable release promotion, is to install the Linux binary and bounce the systemd service.

Here's a temporary upgrade script that selects the correct binary for your CPU architecture.

(set -euxo pipefail;

cd $(mktemp -d);

ZITI_VERSION=v1.3.3

case $(uname -m) in
  x86_64)          GOXARCH=amd64 ;;
  aarch64|arm64)   GOXARCH=arm64 ;;
  arm*)            GOXARCH=arm   ;;
  *)               echo "ERROR: unknown arch '$(uname -m)'" >&2
                   exit 1        ;;
esac;

curl -sSfL \
  "https://github.com/openziti/ziti/releases/download/${ZITI_VERSION}/ziti-linux-${GOXARCH}-${ZITI_VERSION#v}.tar.gz" \
  | tar -xz -f -;

sudo install -o root -g root ./ziti /usr/local/bin/;
ziti --version;
)

This assumes that /usr/local/bin/ has higher precedence in your binary search path than /usr/bin/ (the package-manged executable dir), so it may be necessary to adjust your path or delete /usr/bin/ziti for 1.3.3 to be invoked first.

I restarted from scratch:

  • I updated quickstart and router binaries to 1.3.3
  • I enabled the raft stanza in config file
  • I recreated server certificates with SPIDDE ID
  • I restarted ctrl and routers
    Single node cluster is up
╭─────┬─────────────────────────┬───────┬────────┬─────────┬───────────╮
│ ID  │ ADDRESS                 │ VOTER │ LEADER │ VERSION │ CONNECTED │
├─────┼─────────────────────────┼───────┼────────┼─────────┼───────────┤
│ oci │ tls:ziti.mydomain.my:8440 │ true  │ true   │ v1.3.3  │ true      │
╰─────┴─────────────────────────┴───────┴────────┴─────────┴───────────╯
╭───────────┬──────────────────────────────────────┬─────────┬────────────────────────┬─────────────────────┬─────────────────────╮
│ ID        │ CLIENT                               │ SERVICE │ TERMINATOR             │ CREATEDAT           │ PATH                │
├───────────┼──────────────────────────────────────┼─────────┼────────────────────────┼─────────────────────┼─────────────────────┤
│ gMXsWAAC- │ 33b1fecf-4eb2-4de9-b95f-e37d2d225291 │ lte     │ 6ByI9DCUGISsur6HN9lm0m │ 2025-02-17 11:28:06 │ r/cqc-public-router │
╰───────────┴──────────────────────────────────────┴─────────┴────────────────────────┴─────────────────────┴─────────────────────╯
results: 1-1 of 1
╭────────────┬───────────────────┬────────┬──────┬──────────────┬──────────┬───────────────────────┬───────────────────────────╮
│ ID         │ NAME              │ ONLINE │ COST │ NO TRAVERSAL │ DISABLED │ VERSION               │ LISTENERS                 │
├────────────┼───────────────────┼────────┼──────┼──────────────┼──────────┼───────────────────────┼───────────────────────────┤
│ CgSdT-w9Z  │ oci-public-router │ true   │    0 │ false        │ false    │ v1.3.3 on linux/amd64 │ 1: tls:ziti.mydomain.my:80  │
│ P5dAgt1Si9 │ gcp-public-router │ true   │    0 │ false        │ false    │ v1.3.3 on linux/amd64 │ 1: tls:gziti.mydomain.my:80 │
│ i6AWK01Ti  │ cqc-public-router │ true   │    0 │ false        │ false    │ v1.3.3 on linux/amd64 │ 1: tls:home.mydomain.my:80  │
╰────────────┴───────────────────┴────────┴──────┴──────────────┴──────────┴───────────────────────┴───────────────────────────╯
results: 1-3 of 3
╭────────────────────────┬───────────────────┬───────────────────┬─────────────┬─────────────┬─────────────┬───────────┬────────┬───────────╮
│ ID                     │ DIALER            │ ACCEPTOR          │ STATIC COST │ SRC LATENCY │ DST LATENCY │ STATE     │ STATUS │ FULL COST │
├────────────────────────┼───────────────────┼───────────────────┼─────────────┼─────────────┼─────────────┼───────────┼────────┼───────────┤
│ 1QtQ74InpbgwB4iGfWBVmu │ cqc-public-router │ gcp-public-router │           1 │     141.3ms │     140.3ms │ Connected │     up │       282 │
│ 55cHYnQIUCPpnDJ7nAknQF │ oci-public-router │ cqc-public-router │           1 │      28.9ms │      24.4ms │ Connected │     up │        53 │
│ 6fuohcmY0YjVKnCkZjkro4 │ gcp-public-router │ oci-public-router │           1 │     123.7ms │     122.6ms │ Connected │     up │       246 │
╰────────────────────────┴───────────────────┴───────────────────┴─────────────┴─────────────┴─────────────┴───────────┴────────┴───────────╯
results: 1-3 of 3

Service are still reachable with my android existing identity.
I created a new android identity and deployed in a new android profile.
Enrolment went fine and I can reach services.

On ZAC something is still missing: the new identity is offline and has still the generic desktop icon
immagine

No errors logged on routers or controller this time.

Thank you for the detailed steps you took.

Overall, it sounds like things are working from the overlay network perspective, but it seems like ZAC might have a couple of things to sort out. This isn't abnormal. The overlay network itself is the basis of the other componentry and often implements features that are updated in ZAC at a later time. I'll make sure the team is aware of the ZAC-related issue you're observing and I'm sure we'll figure out what's wrong shortly and have a fix for it in the coming weeks.

Thanks again! Cheers

I believe that the first dot/circle indicates existence of an API session. In HA API sessions are no longer persisted/durable objects. Instead they are bearer tokens. There is no way for ZAC to know about them. The important part for openziti endpoints (for service access) are connections to edge routers -- that's the seconds dot.

Thanks, nice to known.
Now I can understand better how to diagnose it.

Android client now is authenticated, but I had to force restart it.

The network seems overall working, but some tunneler are still giving problems.

The nice ZAC visualizer now show all links as broken, but now I think is due to the HA nature