Router, Docker and DNS

Hey,
I currently have 3 devices.

One has the controller and a public router.
The other two devices are raspberry pi's, with raspberry pi os (latest, based on bookworm), some services, of many are docker containers.
The two raspberry pi's use the edge tunneler to connect and provide services to the ziti overlay and everything is working more or less.

One big problem is the speed and connectivity to said services.
If a client in the same network as the two pi's want to connect to said services over the ziti overlay, it will go through the public router to the edge tunnel till it lands at the service.
This seems to be very slow, as file uploads fail, hang or disconnect sometimes.
Sometimes the edge tunnel seems to crash (it restarted without me doing anything) and I observed dns resolution failures from clients inside a docker container.

So i thought, that a more direct connection via private edge routers on each of the two pi's could improve this.
I followed multiple topics like DNS resolver not working with Docker - #4 by qrkourier, Creating a second public edge router - #17 by markamind, Access to a service trough private router - #12 by scareything, the docs CLI Mgmt | OpenZiti and some others.

I did get the routers itself working.
I enabled the tunneler mode with tproxy (so that clients on the router can also make requests to the overlay) but this does not seem to work with clients in docker containers, if the router is on the host.
I can connect to services on the router host from other devices.
One problem is that the router starts its own dns server on port 53 on the configured ip in the config or localhost.
Another weird thing seems to be that it adds all the ziti IPs to the loopback interface.
Both points seem in stark contrast to the ziti edge tunneler, where it creates a ziti0 (bridge?) interface, where the dns listens and the environment else is relatively clean.

I did try with the normal resolv.conf file and with systemd-resolved, but the latter kicked the ziti dns server always off, as it does not work as a normal dns resolver.

Is the only (non-sane) solution to run the router (host mode) and edge tunneler on the same device?
Or is there a configuration where i can only have the router, which can act the same as the tunneler but works with docker container as clients?

Router with Tunnel option enabled as interception initiates tproxy port per service and configures the NF-INTERCEPT chain to mark packets in the ingress direction to be diverted to these tproxy ports based on the destination addresses. There are also some local routes set up for locally generated traffic. Probably something to do with the traffic coming off the docker network interface and not be able to forward toward the proxy ports. I have not tested this scenario myself. I will give a try and see what I get.

1 Like

Yes, it looks like you are hitting routing issue and subnets configured for the tproxy option, i.e. ip addresses configured on the loopback for binding to non-local subnets. I guess one could label this as the kernel networking limitation. You can try the diverter option instead of the iptables redirection. We have developed a ebpf program that can be attached to docker0 interface (and main interface as well if needed for outside clients reachability). The kernel side bytecode will intercept packets based on the ziti services exposed to the local ziti router and forward them directly to tproxy sockets replacing the iptables redirect . You can find more details in README on how to enable the diverter option as we call it.

1 Like

I'd like to help with this issue if you're still interested in going through ziti-edge-tunnel. Could you please share your service configurations?

Is this a sporadic occurrence, or does every dns query fail? Do you know if the dns query failures coincide with the upload failures?

-Shawn

Sure i can.
I will try it the eBPF-based program later and see if this helps.

Ziti Service Configurations:

Service repo.pi.lan with dns failure
{
    "createdAt": "2023-10-20T08:25:48.044Z",
    "id": "tMDrMbYelmZx3nXe01ZJi",
    "tags": {},
    "updatedAt": "2023-10-20T08:29:46.760Z",
    "config": {},
    "configs": [
        "xxx",
        "xxx"
    ],
    "encryptionRequired": true,
    "name": "https.repo.pi",
    "permissions": [
        "Bind",
        "Dial"
    ],
    "postureQueries": [],
    "roleAttributes": [
        "host-pi.lan"
    ],
    "terminatorStrategy": "smartrouting"
}
Host config repo.pi.lan
{
    "createdAt": "2023-10-20T08:25:11.409Z",
    "tags": {},
    "updatedAt": "2023-10-20T08:25:11.409Z",
    "configType": {
        "name": "host.v1"
    },
    "data": {
        "address": "127.0.0.1",
        "port": 443,
        "protocol": "tcp"
    },
    "name": "https.repo.pi.host.v1"
}
Intercept config repo.pi.lan
{
    "createdAt": "2023-10-20T08:24:44.560Z",
    "tags": {},
    "updatedAt": "2023-10-20T08:24:44.560Z",
    "configType": {
        "name": "intercept.v1"
    },
    "data": {
        "addresses": [
            "repo.pi.lan"
        ],
        "portRanges": [
            {
                "high": 443,
                "low": 443
            }
        ],
        "protocols": [
            "tcp"
        ]
    },
    "name": "https.repo.pi.intercept.v1"
}
Service Bind Policy pi.lan
{
    "createdAt": "2023-10-15T11:39:59.910Z",
    "tags": {},
    "updatedAt": "2023-10-18T14:45:31.866Z",
    "identityRoles": [
        "@fT.ysX8d3"
    ],
    "identityRolesDisplay": [
        {
            "name": "@pi.lan",
            "role": "@fT.ysX8d3"
        }
    ],
    "name": "pi.lan-hosted",
    "postureCheckRoles": null,
    "postureCheckRolesDisplay": [],
    "semantic": "AnyOf",
    "serviceRoles": [
        "#host-pi.lan"
    ],
    "serviceRolesDisplay": [
        {
            "name": "#host-pi.lan",
            "role": "#host-pi.lan"
        }
    ],
    "type": "Bind"
}

The Dial Policies are uninteresting, as all devices can access all services for now.
curl: (6) Could not resolve host: repo.pi.lan happened two days ago in a drone runner job, which was running in a docker container.

These dns problems occur sporadically in my opinion.
It may have something to do with ziti or the services/configurations on the host itself, i currently cannot tell, only observe.
The dns failures do and did not coincide with the upload failures.

As i tried to upload a docker image via drone job, it went up to 30 minutes and then failed with "could not connect to server", with the ziti ip address as the target.
I then tried to do it manually on the host itself, with the docker cli, but then it got stuck and i needed to restart the docker daemon.
I think i tried it without ziti afterwards, with /etc/hosts entries etc. and then it worked relatively fast.

The dns and connectivity problems are not limited to one service, but multiple services may be affected.
Just a side-note, on android, the app does not seem to work that well. Sometimes it works in specific apps, often/nearly always it does not work in firefox etc, but thats another topic.

Thanks for sharing the details. I have a suspicion about the dns issues that you're seeing, and I'm working on a fix for it. I'll set up a scenario similar to yours once I have something to test.

1 Like

Hi there. Just FYI I haven't forgotten about you. I'm working on an issue that causes the tunneler to stop handling network i/o while hostnames are being looked up, and vice-versa. There may be other issues that lead to your situation but I wanted to get this one out of the way first. I should have a PR ready by early next week.

1 Like

Hello again. It's taken longer than I anticipated, but I've made the ziti-edge-tunnel fixes that I suspect will improve the situation for you. The PR is up but not yet approved. I don't expect approval until later this week, as most of the core team is busy this week with travel and major changes to the controller and routers.

If you have some appetite for adventure and would like to let me know if my fixes help before they make it into the next release of ziti-edge-tunnel, you can get them from the GitHub workflow build for the PR.

edit:

ziti-edge-tunnel 0.22.13 includes the fixes that I've been talking about, and it's currently available as a pre-release. Let me how it goes if you have a chance to try it out.