Deploy Ziti tunnel in public node and other apps in private nodes

qrkourier · January 27, 2025, 4:30pm

If I understand correctly, all data from private subnet to internet must egress the NAT to IGW. That is, from the IGW's perspective, the packets from the private subnet originate from the public IP of the NAT in the public subnet.

These are the IP route hops I have in mind: private --> nat --> igw. The overlay hops for the direct link idea would relate to those IP routes like so: app --> zet --> (private --> nat -->) er2 <-- (<-- igw) <-- er1 --> db

qrkourier · January 27, 2025, 5:15pm

It looks like the NAT GW is not required for private-to-public IP routes inside the VPC, so now I agree you can avoid the NAT GW cost for private subnet traffic representing the Ziti data plane, i.e., between the tunneler and router, which is the vast majority of the bytes. A small amount of tunneler-to-controller control plane traffic must flow through the NAT GW from private subnet because the destination public IP is not in the VPC.

qrkourier · January 27, 2025, 5:25pm

Earlier in the thread we discussed edge router policies. This is relevant to the cost saving goal. Tunneler2 will use router2 if it is allowed to use router2 and router2 is the first to respond. If router2 is super busy, then router3 may respond first and the comparatively-expensive data will flow through the NAT GW to router3.

You could have multiple routers in network2 providing some fault tolerance and write your router policy to grant permission only for those routers in network2, disallowing tunneler2 from using router3. This would guarantee the edge data flows only through the IGW, not the NAT GW.

yemaney · January 27, 2025, 5:52pm

all data from private subnet to internet must egress the NAT to IGW

That's how I understand NAT and IGW relationship aswell.

NAT GW is not required for private-to-public IP routes inside the VPC

Yes, as I understand, traffic within a VPC doesn't necessarily require NAT as vpc internal ips can be used for communication between resources.

you can avoid the NAT GW cost for private subnet traffic representing the Ziti data plane, i.e., between the tunneler and router, which is the vast majority of the bytes. A small amount of tunneler-to-controller control plane traffic must flow through the NAT GW from private subnet because the destination public IP is not in the VPC.

I'm assuming this would mean that tunneler2 would be forwarding traffic to router2 using the vpc private ip of the instance its running on. Because if it used the public ip of router2, it would go through the NAT.
Im assuming this means, when the "router2" is created and registers with the controller, it tells the controller its VPC internal Ip? In order for the controller to tell the tunneler2 which internal ip to make a request to? Otherwise, how would the tunnel2 know to use the private ip of router2?

qrkourier · January 27, 2025, 5:57pm

tunneler2 in the private subnet will send data plane traffic to the "advertise" address specified in router2's config.yml:

listeners:
  - binding: edge
    address: tls:0.0.0.0:3022
    options:
      advertise: router2.ziti.example.com:3022

Is router2 an EC2 instance in the public subnet, or is it a pod on a worker node that is an EC2 instance in the public subnet?

If router2 is a pod, then you can probably advertise its ClusterIP service domain name, e.g., router2.ziti.svc.cluster.local.

yemaney · January 27, 2025, 8:59pm

Yes, router 2 is running as a pod.
Deployed using the helm chart, with advertisedHost as the ClusterIP service domain name.
Thanks, for the help. This has made me confident to test it out now

resource "helm_release" "router" {
  name             = var.routerName
  repository       = "https://openziti.github.io/helm-charts/"
  chart            = "ziti-router"
  version          = "1.0.4"
  ...
  set {
    name  = "advertisedHost"
    value = format("%s.ziti.svc.cluster.local", var.routerName)
  }

}

yemaney · January 27, 2025, 9:34pm

And I'm guessing if it was an ec2 instance within the same VPC. The advertised host should be the internal vpc IP address of the ec2 instance. So that the tunnel makes the request to an address withing the vpc (will test)

qrkourier · January 27, 2025, 9:54pm

Yep. You could advertise the router's static, private IP address if it's routeable by the tunneler.

yemaney · January 31, 2025, 2:53am

I think I've managed to do it.

Made sure to allow egress of private nodes (that use NAT) to controller only on port 1280. As this is the clientApi.advertisedPort which I believe is used by tunnels in order to talk to the contoller.

A router with same IP as the controller exists, but with linkListeners.transport.advertisedPort=10080 iand edge.advertisedPort=3022
Is it safe to say:

allowing egress on 1280 port is only used by clients (tunnel) to talk with controller.
tunnels talk to routers on edge.advertisedPort
since edge.advertisedPort in this case is 3022 and is not whitelisted in egress rules, there is on way tunnel is sending intercepted data to this external router

- ziti controller
  - ctrlPlane.advertisedPort: 1280
  - ctrlPlane.advertisedPort: 6262 
- router in controller cluster: 
  - edge.advertisedPort: 3022
  - linkListeners.transport.advertisedPort: 10080
- cluster:
  - router in public node
  - trino in private node
  - tunnel in same private node as trino
- Firewall rules:
  - rule to allow ziti tunnels to reach ziti ctrl: allow egress to ip of ziti controller port 1280
  - deny all other egress

qrkourier · January 31, 2025, 3:31pm

The tunnelers in the private subnet will function normally and spew errors if they're allowed by ERP to use routers' edge listeners they cannot reach due to the egress firewall.

Did you mean this?

- ziti controller
  - clientApi.advertisedPort: 1280
  - ctrlPlane.advertisedPort: 6262

While Ziti's internal default port for the ctrlPlane remains 6262, there is a newer feature leveraged by the latest deployments: single-port operation. This means the deployments, including the controller's Helm chart, default to using the same TCP port for the client API and control plane TLS servers. There's no conflict because the ClientHello requests an ALPN protocol.

TL;DR You can use the same port for everything you publish with the same ingress or TCP LB, or you can use separate ports if you prefer. The same is true for routers' ports.

yemaney · February 1, 2025, 1:16am

tunnelers in the private subnet will function normally and spew errors if they're allowed by ERP to use routers' edge listeners they cannot reach due to the egress firewall.

That makes sense. Looking at the logs it seems like the tunneler in private node can connect to the "internal-router" inside the same network, but not to the "external-router" that is in the ziti controller network. Which is what is expected

Is it fair, that I take this to be confirmation that the data mesh traffic for the private tunneler is only going through the "internal-router" inside in the public subnet and not the other "external-router" outside their network? Effectively proving data mesh traffic isn't going through NAT.

 INFO ziti-edge-tunnel:tun.c:196 tun_commit_routes() starting 3 route updates
 INFO ziti-sdk:channel.c:669 hello_reply_cb() ch[1] connected. EdgeRouter version: v1.1.3|82c4a7125227|2024-05-30T16:36:13Z|linux|amd64
 INFO tunnel-cbs:ziti_tunnel_ctrl.c:843 on_ziti_event() ztx[data-app] router internal-router connected
 INFO ziti-edge-tunnel:tun.c:118 route_updates_done() route updates[3]: 0/OK
 INFO ziti-sdk:posture.c:206 ziti_send_posture_data() ztx[0] first run or potential controller restart detected
ERROR ziti-sdk:channel.c:709 ch_connect_timeout() ch[0] connect timeout
 INFO ziti-sdk:channel.c:775 reconnect_channel() ch[0] reconnecting in 2933ms (attempt = 1)
ERROR ziti-sdk:channel.c:903 on_channel_connect_internal() ch[0] failed to connect to ER[external-router] [-125/operation canceled]
ERROR ziti-sdk:channel.c:709 ch_connect_timeout() ch[0] connect timeout

Did you mean this?

Yes. Sorry I did meant that.

TL;DR You can use the same port for everything you publish with the same ingress or TCP LB, or you can use separate ports if you prefer. The same is true for routers' ports.

I may of been confusing before. I mentioned ports as was worried allowing egress on 1280, could somehow allow data traffic from private node to ER in ziti controller. But I think that worry is resolved given the logs from above.
But this is also interesting. For now will probably settle for different ports

bazooka720 · February 3, 2025, 5:15pm

@qrkourier this approach works then?

qrkourier · February 3, 2025, 5:30pm

Yes, assuming you're asking whether you can partition high-bandwidth traffic, i.e., the data plane, with router policies and link groups.

yemaney · February 4, 2025, 12:08am

Seems like it worked. Here is image of NAT traffic when NAT is used vs ziti is used to avoid NAT traffic.

qrkourier · February 4, 2025, 3:20am

There's a story to be told here about leveraging overlay policies as steering authority over network flows.

Topic		Replies	Views
Kubernetes Deployment and Node external Ips security risk Ziti Overlay	6	62	January 23, 2025
Implementing an overlay network Ziti Overlay	10	272	May 1, 2023
Feedback on deployment diagram Ziti Overlay	5	470	August 26, 2022
High availability architecture for routers and controller Ziti Overlay	7	167	July 30, 2024
Securing Kubernetes with OpenZiti General Questions	11	550	September 4, 2022

Deploy Ziti tunnel in public node and other apps in private nodes

Related topics