Hi Team,
Apologized in advance if this had been addressed but since I couldn’t find a topic related to this.
I was on Controller 1.5.4 and HA of 4 controllers (3 Voter 1 Follower) works well.
HA test, edge client was able to switch to another controller when i shut down connected controller 1 by 1 until reaches the last surviving controller.
Tested working for all overlay services without issues.
However, when I am on 1.5.10, the behavior totally changed.
It refused to switch controller. discovered controller endpoint always missing other cluster controllers. sometimes 2 edge api available sometimes 3 edge api available while 1.5.4 always returned all controllers so far.
Is this a known issue that I missed out? or new controller have edge client version / tunnel-service binary dependency to use newer version etc.?
Any insights would be very helpful
Regards
Edmund
Hi Edmund, this is surprising. There are no code changes in the openzit/ziti project itself from 1.5.4 to 1.5.10, only library updates. It's mostly 3rd partly library updates, with a couple of high priority, low-risk bug fixes in openziti libraries.
Would you be willing to try other 1.5.x versions to see at which version specifically it broke for you?
Paul
I had reverted back to 1.5.4 and is now working hence let me find another opportunity to confirm issues again.
i am not sure if this is a confirmed bug or I am just too paranoid. From my observation, upgrading from 1.5.4 to 1.5.10 works but some strange behavior occurred but it could be my setup issues
I will do more tests when available and report again. Since this was not a known issue I assumed either it is localized issue due to my setup or something not yet known.
Thinking about it some more, I wonder if your cluster got in a bad state. It used be easy to end up in a situation where you ended up with multiple nodes, each bootstrapped into separate single node clusters, but sending confusing raft information back and forth. We've since tried to simplify bootstrapping and to make sure that separate clusters can't be joined.
Other things it might be:
In general the logic which tracked the set of controllers with additional metadata outside of raft was tricky to get right, so it's possible you hit an issue there.
If you have the time and interested, I'd be curious how the lastest 1.8 prerelease was working for you. We're getting close to moving HA out of beta, so if there's any lingering issues, it would be nice to resolve them before we do that 
Cheers,
Paul