Helm Port Mappings

That definitely moved things forward!

ziti edge policy-advisor services -q
OKAY : user1 (1) -> Nginx (1) Common Routers: (1/1) Dial: Y Bind: N 

OKAY : edge-router (1) -> Nginx (1) Common Routers: (1/1) Dial: N Bind: Y 

OKAY : user2 (1) -> Nginx (1) Common Routers: (1/1) Dial: Y Bind: N 
ziti edge policy-advisor identities -q
OKAY : user1 (1) -> Nginx (1) Common Routers: (1/1) Dial: Y Bind: N 

OKAY : edge-router (1) -> Nginx (1) Common Routers: (1/1) Dial: N Bind: Y 

OKAY : user2 (1) -> Nginx (1) Common Routers: (1/1) Dial: Y Bind: N 

ERROR: Default Admin 
  - Identity does not have access to any services. Adjust service policies.

The android logs now show that it tries to connect, but gets a connection refused:

11-20 11:27:40.286 12490 30739 D ziti-conn[crJtg0XPf/7]: connecting to Nginx
11-20 11:27:40.286 12490 30739 D o.o.i.ZitiContextImpl: getNetworkSession(Nginx)
11-20 11:27:40.286 12490 30739 D ziti-conn[crJtg0XPf/7]: using session[clp76ii001e5z0d6cvdnmy522]
11-20 11:27:40.286 12490 30739 D C.k.com:8443]: forced re-connect
11-20 11:27:40.537 12490 12580 V routing : got msg[(/100.64.0.0:36518, /100.64.1.2:80)]: 60 bytes
11-20 11:27:40.537 12490 12580 D tcp-conn: tcp:/100.64.0.0:36518 -> nginx.ziti/100.64.1.2:80/LISTEN transitioning to SYN_RCVD
11-20 11:27:40.538 12490 12580 I routing : created tcp:/100.64.0.0:36518 -> /100.64.1.2:80
11-20 11:27:40.538 12490 30083 D ziti-conn[crJtg0XPf/8]: connecting to Dial(service=Nginx, appData=DialData(dstProtocol=TCP, dstHostname=nginx.ziti, dstIp=null, dstPort=80, srcProtocol=null, srcIp=null, srcPort=null, sourceAddr=null), identity=null, callerId=crJtg0XPf)
11-20 11:27:40.538 12490 12570 D ziti-conn[crJtg0XPf/8]: connecting to Nginx
11-20 11:27:40.538 12490 12570 D o.o.i.ZitiContextImpl: getNetworkSession(Nginx)
11-20 11:27:40.538 12490 12570 D ziti-conn[crJtg0XPf/8]: using session[clp76ii001e5z0d6cvdnmy522]
11-20 11:27:40.538 12490 12570 D C.k.com:8443]: forced re-connect
11-20 11:27:41.316 12490 12580 V routing : got msg[(/100.64.0.0:36502, /100.64.1.2:80)]: 60 bytes
11-20 11:27:41.569 12490 12580 V routing : got msg[(/100.64.0.0:36518, /100.64.1.2:80)]: 60 bytes
11-20 11:27:43.332 12490 12580 V routing : got msg[(/100.64.0.0:36502, /100.64.1.2:80)]: 60 bytes
11-20 11:27:43.585 12490 12580 V routing : got msg[(/100.64.0.0:36518, /100.64.1.2:80)]: 60 bytes
11-20 11:27:44.651 12490 12570 D C.k.com:8443]: reconnecting after timeout
11-20 11:27:44.652 12490 30740 D C.k.com:8443]: transitioned to Connecting
11-20 11:27:44.668 12490 12570 D TrafficStats: tagSocket(107) with statsTag=0xffffffff, statsUid=-1
11-20 11:27:44.748 12490 12570 W C.k.com:8443]: channel disconnected: java.net.ConnectException: Connection refused
11-20 11:27:44.749 12490 12574 D C.k.com:8443]: transitioned to Disconnected(err=java.net.ConnectException: Connection refused)
11-20 11:27:44.754 12490 12570 D C.k.com:8443]: delaying connect 48000 ms (retry=8)

The one thing that is confusing me, is the IP address shown, 100.64.1.2:80 is not the IP address that I assigned in the service.
Copying from the configurations section of the service that I created: Nginx-Server 10.43.185.163 80 tcp.

I am not certain if this is a misconfiguration, or if that is some sort of internal VPN-like IP address.

So close! Let's check the router pod and controller pod logs for errors emitted at the same moment you try to connect in Android.

kubectl logs --selector app.kubernetes.io/component=ziti-router --tail=-1 -f
kubectl logs --selector app.kubernetes.io/component=ziti-controller --tail=-1 -f

That 100.64.1.2 IP address is expected.

By default, Ziti tunnellers provision intercept routes for authorized Ziti Services in the CG-NAT range 100.64.0.0/10.

When your web browser running in Android queries the OS for the fictitious domain name nginx.ziti, the Ziti tunneller is configured to answer the DNS query with an IP from this range. It provides an IP route or NAT rule to capture that traffic so it can be forwarded via Ziti Edge transport.

That all makes sense, thank you :slight_smile:

Unfortunately, nothing happens in either logs when I try to pull up http://nginx.ziti.

I tried it in two different browsers (vanadium and bromite).
The first says ERR_CONNECTION_REFUSED, and the second says DNS_PROBE_STARTED.
It seems that it cannot resolve nginx.ziti.

I suspect a problem with the hosting tunneller being able to reach the address you specified in the Simple Service form.

In that form, you specified that the hosting tunneller, the Ziti Router, should reverse-proxy (send) the traffic to the Nginx service's cluster IP, 10.43.185.163 on 80/tcp, correct?

Is the Ziti Router pod able to reach that IP:PORT? If not, I'd expect some errors from the Router log at the moment you tried to access the Ziti Service in Android.

The web browser also running in Android is seeing connection refused because it was able to resolve the domain name nginx.ziti, but the TCP stream is not open to the destination for some reason.

Although the Nginx service is a good, simple server to test with, perhaps there is a NetworkPolicy or other variable preventing the Router pod from reaching that cluster IP?

I assume you're not seeing any HTTP requests logged by Nginx because the only response on the client side is connection refused.

I'm also checking with the author of the Android app to see if the logs might be saying the Router's edge listener is the thing that couldn't be connected (connection refused).

Can you reach the Ziti Router's edge listener from the Android web browser? You should see something along the lines of ERR_SSL_PROTOCOL_ERROR if you are able to reach it: https://{{ ziti-router service's external IP/advertisedHost value }}/.
You can find the Router's advertised host by reviewing the Router chart's values you supplied. This is important because the Android tunneller is receiving this advertisedHost address from the controller and attempting to negotiate a Ziti Edge session with that address.

helm get values ziti-router | yq '.edge.advertisedHost'

You can verify the Router's edge listener is presenting a server certificate.

openssl s_client -connect {{ ziti-router service's external IP/advertisedHost value }}:443  -alpn ziti-edge <>/dev/null |& openssl x509 -noout -subject
subject=C = , ST = , L = , O = , OU = , CN = fJg4Q4662

List routers to confirm server certificate subject.

ziti edge list ers                            
╭───────────┬─────────────────┬────────┬───────────────┬──────┬────────────────╮
│ ID        │ NAME            │ ONLINE │ ALLOW TRANSIT │ COST │ ATTRIBUTES     │
├───────────┼─────────────────┼────────┼───────────────┼──────┼────────────────┤
│ fJg4Q4662 │ miniziti-router │ true   │ true          │    0 │ public-routers │
╰───────────┴─────────────────┴────────┴───────────────┴──────┴────────────────╯
results: 1-1 of 1

Yes, I shelled into the router pod to verify:

[ziggy@ziti-router-867c94dfd4-2gsrh ~]$ curl
curl: try 'curl --help' or 'curl --manual' for more information
[ziggy@ziti-router-867c94dfd4-2gsrh ~]$ curl 10.43.185.163
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

No, the nginx pod did not log anything.

I just noticed that the LoadBalancer IP address is <pending> again.
I remedied this, but apparently it has resurfaced.
I will do some debugging and see if fixing that remedies the situation.

1 Like

I figured out the issue!

There is a bug in the router chart. It does not respect annotations.

From the controller chart (working):

ctrlPlane:
  containerPort: 6262
  advertisedHost: ziti-controller.domain.com
  advertisedPort: 8440
  service:
    enabled: true
    type: LoadBalancer  # this is the only service that really needs to be exposed
    annotations:
      metallb.universe.tf/address-pool: ziti-controller

From the router chart (not working):

edge:
  enabled: true
  containerPort: 3022
  advertisedHost: ziti-router.domain.com
  advertisedPort: 8443
  service:
    enabled: true
    type: LoadBalancer  # this is the only service that really needs to be exposed
    annotations:
      metallb.universe.tf/address-pool: ziti-router
kubectl -n ziti describe svc ziti-controller-ctrl
Annotations:              meta.helm.sh/release-name: ziti-controller
                          meta.helm.sh/release-namespace: ziti
                          metallb.universe.tf/address-pool: ziti-controller
                          metallb.universe.tf/ip-allocated-from-pool: ziti-controller
kubectl -n ziti describe svc ziti-router-edge
Annotations:              meta.helm.sh/release-name: ziti-router
                          meta.helm.sh/release-namespace: ziti
1 Like

Interesting discovery. At a glance, it looks like the annotations you specified in the Helm release's values would be preserved in the edge and transport services.

The template has:

  {{- with .Values.edge.service.annotations }}
  annotations:
  {{ toYaml . | nindent 4 }}
  {{- end }}

I exported my Router release values, and added an edge service annotation.

    annotations:
      this: that

After upgrading the Helm release with the modified values I do see the new annotation.

$ kubectl get svc ziti-router-edge -o go-template='{{ range $k,$v := .metadata.annotations }}{{printf "%s:\t%s" $k $v}}{{"\n"}}{{end}}'
meta.helm.sh/release-name:      ziti-router
meta.helm.sh/release-namespace: miniziti
this:   that

Did you find a way to modify the chart to preserve your annotation? Is another controller in the cluster manipulating annotations?

This has been the absolute only case where this has happened.
I even just tried switching the type to type: ClusterIP, and then back to type: LoadBalancer to see if it would pick it up properly.
Sadly, it still lacks the annotation.

I manually ran this:

kubectl -n ziti annotate service ziti-router-edge metallb.universe.tf/address-pool=ziti-router

Which adds the annotation properly.

Sadly, I tested out pulling up http://nginx.ziti after resolving that issue manually, and it still hits the same DNS resolution error.

I also just hit an error that I have never experienced before.

When running a ziti edge login, I get this:

[  30.001]    INFO ziti/ziti/cmd/helpers.StandardErrorMessage: Connection error: Get https://ziti-controller.domain.com.com:8441/.well-known/est/cacerts: dial tcp 72.14.185.43:8441: i/o timeout
Unable to connect to the server: dial tcp 72.14.185.43:8441: i/o timeout

I have no idea what is going on, but 72.14.185.43 is some IP address belonging to Linode.
It does not match the A record I have for my controller.

could it be the doubled ".com" causing some kind of problem? that's not intentional is it?

It was exactly that, well spotted!
I have zero idea how that made its way into the command.
Thank you, sir :slight_smile:

You're seeing a DNS resolution error, not a connection refused error, when you attempt http://nginx.ziti in a web browser on Android, correct?

That's unexpected because the Android tunneller log shows the domain name is resolved to an intercept IP. It also shows a connection refused error, and I hypothesize that is the core issue, that the Router's Edge listener isn't reachable by Android.

Your exploration of the MetalLB annotation lends credibility to that hypothesis because it would explain why the Router's Edge listener, i.e. the ziti-router-edge K8s Service, isn't reachable.

I understand that you ran the kubectl annotate command to work around the problem of adding it unsuccessfully as a Helm Release value in .edge.service.annotations.

Are you able to reach that K8s Service's TLS server after manually annotating?

You can visit that K8s Service's external IP on 443/tcp in a web browser and you should see an SSL protocol error if it's working. A more definitive test is to fetch the TLS server certificate like this:

openssl s_client -connect {{ ziti-router-edge external IP }}:443  -alpn ziti-edge <>/dev/null |& openssl x509 -noout -subject

Also, ensure that the ziti-router-edge K8s Service's external IP matches the value of the Router's Helm Release input .edge.advertisedHost. That input value must be either the same IP address or a domain name that resolves to the same IP address.

I did try pulling up the edge router in an android browser, but it could not connect.

I ran your command, and it has an error:

openssl s_client -connect ziti-router.domain.com:8443  -alpn ziti-edge <>/dev/null |& openssl x509 -noout -subject
Could not read certificate from <stdin>
40177EABA87F0000:error:1608010C:STORE routines:ossl_store_handle_load_result:unsupported:crypto/store/store_result.c:151:

I just realised that I did not have the edge port allowed through the firewall.
I am checking that now.

I remedied the port forward, but it did not change anything.
I also tried accessing the router in the android browser directly via IP address, and I received ERR_NETWORK_ACCESS_DENIED

The android logs show this:

11-21 12:02:18.983 12797 16344 D C.k.com:8443]: transitioned to Connecting
11-21 12:02:18.985 12797 15825 D TrafficStats: tagSocket(96) with statsTag=0xffffffff, statsUid=-1
11-21 12:02:18.998 12797 15825 W C.k.com:8443]: channel disconnected: java.net.ConnectException: Connection refused
11-21 12:02:18.999 12797 16178 D C.k.com:8443]: transitioned to Disconnected(err=java.net.ConnectException: Connection refused)
11-21 12:02:18.999 12797 15825 D C.k.com:8443]: delaying connect 6000 ms (retry=16)
11-21 12:02:19.744 12797 12850 V routing : got msg[(/100.64.0.0:40514, /100.64.1.3:80)]: 60 bytes
11-21 12:02:19.999 12797 12850 V routing : got msg[(/100.64.0.0:40530, /100.64.1.3:80)]: 60 bytes
11-21 12:02:21.759 12797 12850 V routing : got msg[(/100.64.0.0:40514, /100.64.1.3:80)]: 60 bytes
11-21 12:02:22.015 12797 12850 V routing : got msg[(/100.64.0.0:40530, /100.64.1.3:80)]: 60 bytes
11-21 12:02:23.734 12797 16343 D ziti-conn[crJtg0XPf/15]: closing conn = 15
11-21 12:02:23.735 12797 16343 E tcp:/100.64.0.0:40514 -> /100.64.1.3:80: failed to connect to Ziti Service: java.net.SocketTimeoutException: failed to connect to Dial(service=Nginx, appData=DialData(dstProtocol=TCP, dstHostname=nginx.ziti, dstIp=null, dstPort=80, srcProtocol=null, srcIp=null, srcPort=null, sourceAddr=null), identity=null, callerId=crJtg0XPf) in 5000 millis
11-21 12:02:23.735 12797 16343 D tcp-conn: tcp:/100.64.0.0:40514 -> nginx.ziti/100.64.1.3:80/SYN_RCVD transitioning to Closed

What is curious to me, is that dstIp is null, amongst a number of other fields.

Your openssl command couldn't obtain the TLS server cert. That points toward a problem with the normal network access to the Router's Edge listener service.

Does shortening the command to only show the connection result reveal anything? Are you sure 8443/tcp is the correct port for that K8s Service's external IP?

openssl s_client -connect ziti-router.domain.com:8443  -alpn ziti-edge </dev/null

Does ziti-router.domain.com:8443 match the Helm Release's values .edge.advertisedHost and .edge.advertisedPort?

I take dstHostname=nginx.ziti, dstIp=null in that Android tunneller log message to mean that the destination for which the timeout occurred is known by a domain name, not an IP address.

Apparently I needed to whack my networking hardware with a mallet.
Now the port forward is actually working, and the openssl command succeeds for the hostname, and the public IP address.

Now I am actually getting logs :slight_smile:

Here is the router:

[161270.524] WARNING ziti/router/xgress_edge.(*edgeClientConn).processConnect [ch{edge}->u{classic}->i{jx7m}]: {type=[EdgeConnectType] chSeq=[16] edgeSeq=[0] error=[service XZpvBjOpcakT6jrlEoZsE has no terminators] token=[37d4f46e-9281-4ab4-b1e1-f92be4f01a41] connId=[6]} failed to dial fabric
[161271.664] WARNING ziti/router/xgress_edge.(*edgeClientConn).processConnect [ch{edge}->u{classic}->i{jx7m}]: {error=[service XZpvBjOpcakT6jrlEoZsE has no terminators] type=[EdgeConnectType] chSeq=[18] edgeSeq=[0] token=[37d4f46e-9281-4ab4-b1e1-f92be4f01a41] connId=[7]} failed to dial fabric
[161276.804] WARNING ziti/router/xgress_edge.(*edgeClientConn).processConnect [ch{edge}->u{classic}->i{jx7m}]: {token=[37d4f46e-9281-4ab4-b1e1-f92be4f01a41] connId=[8] type=[EdgeConnectType] chSeq=[20] edgeSeq=[0] error=[service XZpvBjOpcakT6jrlEoZsE has no terminators]} failed to dial fabric

And the controller:

[31013.860]   ERROR ziti/controller/handler_edge_ctrl.(*baseRequestHandler).returnError [ch{OPHIgSXPFi}->u{classic}->i{0vqA}]: {error=[service XZpvBjOpcakT6jrlEoZsE has no terminators] routerId=[OPHIgSXPFi] operation=[create.circuit] token=[37d4f46e-9281-4ab4-b1e1-f92be4f01a41]} responded with error
[31021.474]   ERROR ziti/controller/handler_edge_ctrl.(*baseRequestHandler).returnError [ch{OPHIgSXPFi}->u{classic}->i{0vqA}]: {token=[37d4f46e-9281-4ab4-b1e1-f92be4f01a41] error=[service XZpvBjOpcakT6jrlEoZsE has no terminators] routerId=[OPHIgSXPFi] operation=[create.circuit]} responded with error
1 Like

Yep, that's a "smoking gun" alright. The no terminators message means that zero Ziti Identities are successfully hosting the Ziti Service you defined with the Simple Service form.

I recall you had assigned hosting permission to the built-in tunneller of the Ziti Router running in your cluster, correct? That Router's name is "edge-router?"

Let's re-run the policy advisor to confirm which Identity should be hosting the Ziti Service named "Nginx."

ziti edge policy-advisor services Nginx 

Is there more than one Ziti Router in play?

Are there any log errors from the hosting Identity, which I expect is the Ziti Router itself, about failing to "bind" or "host" the Ziti Service named "Nginx?" You can also determine the Ziti ID of that Service, and filter the log for the Ziti ID instead of the name.

I recall that you already verified that Ziti Router pod is able to reach Nginx's ClusterIP Service.