If I understand correctly, all data from private subnet to internet must egress the NAT to IGW. That is, from the IGW's perspective, the packets from the private subnet originate from the public IP of the NAT in the public subnet.
These are the IP route hops I have in mind: private --> nat --> igw. The overlay hops for the direct link idea would relate to those IP routes like so: app --> zet --> (private --> nat -->) er2 <-- (<-- igw) <-- er1 --> db
It looks like the NAT GW is not required for private-to-public IP routes inside the VPC, so now I agree you can avoid the NAT GW cost for private subnet traffic representing the Ziti data plane, i.e., between the tunneler and router, which is the vast majority of the bytes. A small amount of tunneler-to-controller control plane traffic must flow through the NAT GW from private subnet because the destination public IP is not in the VPC.
Earlier in the thread we discussed edge router policies. This is relevant to the cost saving goal. Tunneler2 will use router2 if it is allowed to use router2 and router2 is the first to respond. If router2 is super busy, then router3 may respond first and the comparatively-expensive data will flow through the NAT GW to router3.
You could have multiple routers in network2 providing some fault tolerance and write your router policy to grant permission only for those routers in network2, disallowing tunneler2 from using router3. This would guarantee the edge data flows only through the IGW, not the NAT GW.
all data from private subnet to internet must egress the NAT to IGW
That's how I understand NAT and IGW relationship aswell.
NAT GW is not required for private-to-public IP routes inside the VPC
Yes, as I understand, traffic within a VPC doesn't necessarily require NAT as vpc internal ips can be used for communication between resources.
you can avoid the NAT GW cost for private subnet traffic representing the Ziti data plane, i.e., between the tunneler and router, which is the vast majority of the bytes. A small amount of tunneler-to-controller control plane traffic must flow through the NAT GW from private subnet because the destination public IP is not in the VPC.
I'm assuming this would mean that tunneler2 would be forwarding traffic to router2 using the vpc private ip of the instance its running on. Because if it used the public ip of router2, it would go through the NAT.
Im assuming this means, when the "router2" is created and registers with the controller, it tells the controller its VPC internal Ip? In order for the controller to tell the tunneler2 which internal ip to make a request to? Otherwise, how would the tunnel2 know to use the private ip of router2?
Yes, router 2 is running as a pod.
Deployed using the helm chart, with advertisedHost as the ClusterIP service domain name.
Thanks, for the help. This has made me confident to test it out now
resource "helm_release" "router" {
name = var.routerName
repository = "https://openziti.github.io/helm-charts/"
chart = "ziti-router"
version = "1.0.4"
...
set {
name = "advertisedHost"
value = format("%s.ziti.svc.cluster.local", var.routerName)
}
}
And I'm guessing if it was an ec2 instance within the same VPC. The advertised host should be the internal vpc IP address of the ec2 instance. So that the tunnel makes the request to an address withing the vpc (will test)
Made sure to allow egress of private nodes (that use NAT) to controller only on port 1280. As this is the clientApi.advertisedPort which I believe is used by tunnels in order to talk to the contoller.
A router with same IP as the controller exists, but with linkListeners.transport.advertisedPort=10080 iand edge.advertisedPort=3022
Is it safe to say:
allowing egress on 1280 port is only used by clients (tunnel) to talk with controller.
tunnels talk to routers on edge.advertisedPort
since edge.advertisedPort in this case is 3022 and is not whitelisted in egress rules, there is on way tunnel is sending intercepted data to this external router
- ziti controller
- ctrlPlane.advertisedPort: 1280
- ctrlPlane.advertisedPort: 6262
- router in controller cluster:
- edge.advertisedPort: 3022
- linkListeners.transport.advertisedPort: 10080
- cluster:
- router in public node
- trino in private node
- tunnel in same private node as trino
- Firewall rules:
- rule to allow ziti tunnels to reach ziti ctrl: allow egress to ip of ziti controller port 1280
- deny all other egress
The tunnelers in the private subnet will function normally and spew errors if they're allowed by ERP to use routers' edge listeners they cannot reach due to the egress firewall.
While Ziti's internal default port for the ctrlPlane remains 6262, there is a newer feature leveraged by the latest deployments: single-port operation. This means the deployments, including the controller's Helm chart, default to using the same TCP port for the client API and control plane TLS servers. There's no conflict because the ClientHello requests an ALPN protocol.
TL;DR You can use the same port for everything you publish with the same ingress or TCP LB, or you can use separate ports if you prefer. The same is true for routers' ports.
tunnelers in the private subnet will function normally and spew errors if they're allowed by ERP to use routers' edge listeners they cannot reach due to the egress firewall.
That makes sense. Looking at the logs it seems like the tunneler in private node can connect to the "internal-router" inside the same network, but not to the "external-router" that is in the ziti controller network. Which is what is expected
Is it fair, that I take this to be confirmation that the data mesh traffic for the private tunneler is only going through the "internal-router" inside in the public subnet and not the other "external-router" outside their network? Effectively proving data mesh traffic isn't going through NAT.
INFO ziti-edge-tunnel:tun.c:196 tun_commit_routes() starting 3 route updates
INFO ziti-sdk:channel.c:669 hello_reply_cb() ch[1] connected. EdgeRouter version: v1.1.3|82c4a7125227|2024-05-30T16:36:13Z|linux|amd64
INFO tunnel-cbs:ziti_tunnel_ctrl.c:843 on_ziti_event() ztx[data-app] router internal-router connected
INFO ziti-edge-tunnel:tun.c:118 route_updates_done() route updates[3]: 0/OK
INFO ziti-sdk:posture.c:206 ziti_send_posture_data() ztx[0] first run or potential controller restart detected
ERROR ziti-sdk:channel.c:709 ch_connect_timeout() ch[0] connect timeout
INFO ziti-sdk:channel.c:775 reconnect_channel() ch[0] reconnecting in 2933ms (attempt = 1)
ERROR ziti-sdk:channel.c:903 on_channel_connect_internal() ch[0] failed to connect to ER[external-router] [-125/operation canceled]
ERROR ziti-sdk:channel.c:709 ch_connect_timeout() ch[0] connect timeout
Did you mean this?
Yes. Sorry I did meant that.
TL;DR You can use the same port for everything you publish with the same ingress or TCP LB, or you can use separate ports if you prefer. The same is true for routers' ports.
I may of been confusing before. I mentioned ports as was worried allowing egress on 1280, could somehow allow data traffic from private node to ER in ziti controller. But I think that worry is resolved given the logs from above.
But this is also interesting. For now will probably settle for different ports