Ziti controller HA setup behind a HAProxy load-balancer

Hello there!

Tried to test the HAProxy as a loadbalancer for HA controller mode and can't make it to work as round-robin evenly distributes the load and authentication data obtained upon ziti edge login for one controller won't work with other controllers except that one that received the request.

Is it even possible to run a stateless load-balancer in front of ziti controllers?

My main objective to achieve auto failover of requests to the edge api in case one of the controllers(possibly a leader) is failed. Is it possible to achieve it by some other means?

My setup:
haproxy.conf

defaults
  timeout connect 5000
  timeout client 50000
  timeout server 50000

frontend main
  mode tcp
  bind *:443
  use_backend ctrl

backend ctrl
  mode tcp
  balance roundrobin
  server ctrl1 127.0.0.1:1281 check
  server ctrl2 127.0.0.1:1282 check
  server ctrl3 127.0.0.1:1283 check
 โžœ ziti agent cluster list -i ctrl1
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ ID    โ”‚ ADDRESS            โ”‚ VOTER โ”‚ LEADER โ”‚ VERSION โ”‚ CONNECTED โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ ctrl1 โ”‚ tls:localhost:6201 โ”‚ true  โ”‚ true   โ”‚ v1.1.7  โ”‚ true      โ”‚
โ”‚ ctrl2 โ”‚ tls:localhost:6202 โ”‚ false โ”‚ false  โ”‚ v1.1.7  โ”‚ true      โ”‚
โ”‚ ctrl3 โ”‚ tls:localhost:6203 โ”‚ false โ”‚ false  โ”‚ v1.1.7  โ”‚ true      โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
 โžœ ziti -v
v1.1.7

Relevant logs showing that rounrobin would eventually hit the controller from which ziti edge login was processed:

โžœ ziti edge list identities
error: error listing https://localhost:443/edge/management/v1/identities in Ziti Edge Controller. Status code: 401 Unauthorized, Server returned: {
    "error": {
        "code": "UNAUTHORIZED",
        "message": "The request could not be completed. The session is not authorized or the credentials are invalid",
        "requestId": "sdsGqP2bd"
    },
    "meta": {
        "apiEnrollmentVersion": "0.0.1",
        "apiVersion": "0.0.1"
    }
}
โžœ ziti edge list identities
error: error listing https://localhost:443/edge/management/v1/identities in Ziti Edge Controller. Status code: 401 Unauthorized, Server returned: {
    "error": {
        "code": "UNAUTHORIZED",
        "message": "The request could not be completed. The session is not authorized or the credentials are invalid",
        "requestId": "VlsJxAoJY"
    },
    "meta": {
        "apiEnrollmentVersion": "0.0.1",
        "apiVersion": "0.0.1"
    }
}
โžœ ziti edge list identities
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ ID        โ”‚ NAME  โ”‚ TYPE    โ”‚ ATTRIBUTES โ”‚ AUTH-POLICY โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Hq8usIlwt โ”‚ test  โ”‚ Default โ”‚            โ”‚ Default     โ”‚
โ”‚ oLVMs-lwU โ”‚ admin โ”‚ Default โ”‚            โ”‚ Default     โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
results: 1-2 of 2

Alright after a thorough research of a Raft active-standby replication strategy and specifically Hashicorp Vault's implementation it starts to make sense why it wouldn't work and why it shouldn't be applied to Ziti either. Here is a good mailing list thread on why not[1].
So at this point I think HAProxy could be used in two modes:

  1. Sticky session with balance: source and hash-type: consistent which would make hap to stick the session to a particular backend by the client's ip
  2. Just put the current leader as an upstream node in case of DR

The first one barely makes sense as internally ziti controller stand-by node would redirect any request to an active node and call it a day(as Vault does in HA mode). The only thing that I see could be done here is the same as Vault's /sys/health health check that would allow LB to redirect traffic to the leader by checking the endpoint of a node telling that its being a leader.

The second one is a little bit trickier because you have got multiple ways on how to redirect clients to a new node. Either be it DNS or LB or some keepalived with floating IPs.

[1] https://groups.google.com/g/vault-tool/c/Ep6hBDqoBAY

Hi @nenkoru

Are you using OIDC auth when authenticating with HA? That should give you back a JWT, which can be used against any controller. That contrasts with non-OIDC auth, which will give you a session token that is specific to that controller.

To enable OIDC auth, I believe you need to add the edge-oidc binding in the controller config, as below:

web:
  - name: all-apis-localhost
    bindPoints:
      - interface: 127.0.0.1:1280
        address: 127.0.0.1:1280
    options:
      minTLSVersion: TLS1.2
      maxTLSVersion: TLS1.3
    apis:
      - binding: health-checks
      - binding: fabric
      - binding: edge-management
      - binding: edge-client
      - binding: edge-oidc

I'll see if I can get one of my teammates to jump in with some doc or pointers on how to get started with OIDC auth.

Paul

2 Likes

Hi @plorenz!

I am thinking about that โ€˜sys/healthโ€™ idea that Vault implements. This endpoint[1] returns 200 only if the node is โ€˜iniaitilized and activeโ€™, and 429 if โ€˜unsealed, and standby.
Would it be interesting enough to include something like this into the ziti controller functionality as another xweb component?
I could try implementing a PR with this.
This would allow auto failover to the other node that became a leader after previous leader has failed.

[1] /sys/health - HTTP API | Vault | HashiCorp Developer

OIDC authentication is currently undocumented as part of the upcoming HA release. While it is still subject to change, it is unlikely to change.

It supports using any standard OIDC library. The specifics on how to configure that library are as follows:

  1. Ensure Auth Code w/ PKCE is used
  2. The client id should be openziti
  3. The standard .well-known/openid-configuration URL can be used to configure the client either at the root level or at the /edge/client/v1/ or /edge/management/v1/ level.
  4. The authentication endpoints are in the format of POST /oidc/login/<method>?authRequestID=<OIDC auth request id>

OIDC auth request id is a result of the first step in the OIDC process, which sets up an auth context that lives for a maximum of 30m. Before that time, the auth request must finish primary authentication (UPDB, Cert, Ext JWT, etc.) and any secondary authentication (TOTP, Secondary Ext JWT).

The methods are:

  • cert - a client certificate and private key must be used during the POST to perform certificate authentication
  • password - UPDB (username/password) authentication. POST must contain username and password values either in a JSON object or a form encoded content.
  • ext-jwt - Requires an externally created JWT to be submitted in the authentication header.

For TOTP (aka MFA) submission:

  1. POST to /totp with the current TOTP in a code value (either JSON or form encoded)