Problem when deploying Fabric Routers

Greetings.
I am trying to deploy new Fabric Routers in order to extend my fabric connections to isolated networks.
I created a router on a computer that Advertises using a Port Forwarded Ip. (Its real IP is 192.168.0.64 but it advertises as 172.18.102.155). The Port Forwarding is handled by a router (Physical device) in my network.
Without this port forwarding, my Ziti router cannot create listening links from other routers in the network (As they have a different IP).

I have the following errors when I run the Fabric Router. NOTE: Dial links work perfectly but listening links fail with this error.

[ 120.745] ERROR channel.(*classicListener).acceptConnection.func1 [tls:0.0.0.0:10080]: error receiving hello from [tls:172.18.102.81:55198] (receive error (remote error: tls: bad certificate))
[ 180.761] ERROR channel.(*classicListener).acceptConnection.func1 [tls:0.0.0.0:10080]: error receiving hello from [tls:172.18.102.81:55546] (receive error (remote error: tls: bad certificate))
[ 180.776] ERROR channel.(*classicListener).acceptConnection.func1 [tls:0.0.0.0:10080]: error receiving hello from [tls:172.18.102.66:43162] (receive error (remote error: tls: bad certificate))

What I want to do is to connect an Isolated Service (Not accessible on the network but only visible to the 192.168.0.64 machine) through an Edge Router that will be connected to this Fabric router that acts as a bridge between networks.

In short.

(Ziti Network 1) <–> Fabric Router <—> (Private Network 1)

In Ziti Network 1 I have a Ziti controller and 3 Edge routers and a few services “everything works good over there as they are in the same range of IPs”.

But I cannot manage to connect this Fabric Router in order to join my service from Private Network 1 into the Ziti Network.

I am attaching a screenshot of my YAML configuration and the created Fabric Links.

Thank you in advance.



Hi @shaifvier, welcome to the OpenZiti community!

Overall Config

If I understand things properly, you’re trying to connect one “private” network into an OpenZiti network. Here’s a diagram of what I think you’re doing. Hopefully, it is correct?

For this to work, you will need to do a few things. I don’t know how you installed the original ziti controller, or routers. Did you use a quickstart to get going? I’m going to drop a bunch of different thoughts on this post because you’ll probably need them so forgive me if I take you in a few different directions in order to try to help you understand better.

“Fabric Router”

My first comment is about the term “fabric router”. I personally don’t use the term ‘fabric router’ too often. An edge router is also a fabric router. Personally, I just deploy everything as edge-routers. I sometimes will refer to them as ‘private edge routers’. That’s what I envision your “fabric router” to be. Correct me if I’m wrong? The key is that this edge-router doesn’t have a listening port on the open internet - that’s what makes it ‘private’. To accomplish this you’ll need to lines 15-20 and just comment them out. These lines tell the edge-router to listen for OTHER routers to connect to it. The diagram I inferred from your text was that the edge routers in the ‘ziti network’ are going to be the “public” edge routers. If you keep this configuration, the ziti network edge routers will try to form a link to the private/fabric router. I expect you don’t want that?
image

Edge Routers and the Controller

Looking at your config, another likely problem are lines 9 and 10:
image

This represents the address of the controller the “private edge router/fabric router” will try to connect to. I expect “orchestrator” is not addressable from “private edge router/fabric router”. Can that router connect to a host named “orchestrator” on port 6262? I expect not. If you used a quickstart, this is a known change you’ll have to make.

More ports to open

Along with the controller hostname you’ll also notice port 6262 is being referenced. I expect that port is not open in the firewall. If you start with a quickstart and try to attach another edge router, this is another change you’ll need to make. You need to open the port for the edge controller.

You’ll also have to open the port the edge routers use as well. Again, if you used the quickstart (it looks like you did?) you’ll need to open port 10080. That might be a problem too.

Wrap up

Does that help? Hopefully, it allows you to understand what’s going on a bit more. If you need more help, just comment back and we’ll try to get you moving along. Hopefully this explains enough.

1 Like

Thank you for the reply.

Actually, the diagram will look more like this.

I have both the dial and listener enabled on the Fabric Router as I also want to add a private Edge Router that will offload my traffic to a service provider. (Edge and Tunneler enabled)

The Edge Router in the Private Network can have the Listener option disabled and only use the Dial to create a link with the Fabric Router.

I have set up a DNS server in my lab, running on a "Semi-Public" IP. (All the computers inside the building connect to this DNS to do name resolving, and I have added "orchestrator" with the corresponding IP for name translation.
Also, the controller is running on a "Semi-Public" IP and it is accessible by every computer even from different subnets.

At first, I tried to join my private network just with a Private Edge router that only dials to the Edge Router 3 and everything seemed to work correctly - Policy Advisor said that everything was OK. But, my service was not reachable to clients outside the private Network (Tunnel applications).

Client (Tunnel) <------> Edge Router 3 (Dial and Listener) <------ Private Edge Router (Only Dial) <------> Private Service
-The policy Advisor showed that everything was good- But, the service connection was refused.

Then I created another policy to test if my service was not working, and did the following connection.

Client(Tunnel) <----->Private Edge Router (Only Dial) <-----> Private Service

  • In here, the client (tunneler) is running on a computer inside the Private Network. And yes, I could access the service with no issues. Also, from this client, I could also access all the services that are running on the Ziti Network 1

So practically my issue is that I cannot access the private network services from outside the Private Network scope (Computers from another subnet, or IP,. etc.) But I can access all the services from every network if my client is inside the Private Network.

By the way, I have a mix of Ziti versions from the routers. As I did the initial deployments a while ago.
The routers in the Ziti Network 1 have a mix of versions 0.25.3, and 0.25.4, and the Fabric and Private Edge router have versions 0.25.13.

Ah yes, I am also lost with this type of error. I have created the identities and enrolled following the same steps that I did with the previous routers and suddenly I am getting the bad certificate error only on the listening for the new routers.

Again, thank you very much.

Thanks for the updated diagram - that's going to help us both speak the same language! :slight_smile: I'll assume, given your replies, that you are able figure out any sorts of link issues/firewall issues etc. You sound like you have all that under control so when I see things like the controller "inside" the ziti network 1 - i'll just assume that's perfectly fine (which you seem to indidcate with the orange dotted lines)... cool - that helps...

Ok. I'm certain we can resolve this situation for you. With OpenZiti, the location of your client really should not matter if you have setup the services properly. Are you using ip-based services or are you using DNS-based services?

<start tanget>
The diagram you provided looks to pretty close to the docker-compose environment from the quickstarts (minus some stuff) Would you agree? compose quickstart

I bring this up because I'd like to be able to find a way to help which we could both have the same sort of starting point. If you're not familiar with docker, it might not be useful though. If you are ok with working from this as a base maybe I could show you something there that'd help... But until you confirm, i'll just keep trying to answer your question with your diagram
</end tangent>

I'd really like to see how you setup your services, what services you're trying to provide, how you did it etc. And to keep it simple, I'd like to start with just one service. How'd you configure ziti in this case? Can you tell me more about one service that doesn't work from outside the private network? I don't quite feel like I grok the issue just yet.

I'd really recommend you update them all to the lastest tbh. We fix a lot of bugs each week. It's probably worthwhile to update them. I see they are all 0.25.x - that's good, definitely don't want to mix/match the minor versions (which we treat as breaking changes).

That looks to me like a router that doesn't have the proper PKI. I'd try re-enrolling that router by using the ziti cli to delete the router, then re-enrolling it. It is saying the routers with ip 172.18.102.66 and 172.18.102.81 are trying to connect to this router but the certificate presented is not known/valid. That is usually "one or more of the routers is not enrolled properly" error

Yes, actually we have been fiddling with Ziti in our lab and we basically have nailed the basic scenarios where everything is on the same network.

Our services are SDK based, so they automatically create a terminator on the basis of the Service-Edge Router Policies.
The clients use a tunneling application and the services have an intercept configuration in order to interact with the tunneler.

Yes, we tried to mirror the docker-compose scenario using physical devices. It is not 100% similar but the intention is to connect private services to the Ziti Overlay.

I have a video service that was deployed using the golang-SDK for Ziti. It works correctly and can provide video in different resolutions 1080p, 720p, and 480p to Ziti clients.



The above deployment was done by placing the Video server in Ziti Network 1. This service can be accessed by anyone, even clients in the Private Network.

Then we have the Private Network Video Service. For this test, I am not using the Fabric Router (As it is not working due to the Certificate errors) but just a private Edge Router with Listener disabled.

The terminators were created automatically when the SDK enabled service created a bind to the private edge router.

But when I try to access it from an external device (e.g. my Phone) I get a connection refused error.

Continue on the Next Reply.

Continuation

So, If I run a tunnel (client) inside the same Private Network

And curl my service from this client

Success

I can even curl the service that exists outside the Private Network

Testing with FFPlay on my Private Service

Video Plays perfectly.

Circuits are created for this session.

As you can see when consuming the golangVideoServer service, the traffic is local as no fabric link is utilized, I guess the session is established directly by the ingress and egress ports of the Private Edge Router.

But there is a Link that connects to my public edge router.

I also tried to run a tunnel client from outside the Private Network, and I get some weird errors.


Why does the tunnel want to connect to the Private Edge router? I mean, of course, it won’t be able to see it, because it is outside of its network scope. (It does not know what COMPUTA-LAP means) even if I put an IP into that field, that IP is not reachable so the connection will fail always.
Shouldn’t the Tunnel be able to reach my Private Edge router through one of the Public Routers? for example my router named orchestrator-edge-router has a dial and listening (And also edge) enabled, and has an IP that can be reachable by every computer, even the private computers.
It should be used to reach the Private Edge Router, right?

I guess that is the main issue as to why the clients outside of the Private Ziti Network cannot access my private service.

Here is the YAML configuration of my Private Edge Router.

Thank you again :slight_smile:

Quick note… as I noticed something that could be the cause of the problem.

In the services settings above… the tunneller is in “edge” mode… I normally see this in “tunneller” mode.

I am not 100% sure on the difference between tunneller / edge mode…

Maybe @TheLumberjack can offer some more guidance on this.

Perhaps your edge router policy directs your tunnel app to use your private edge router as ingress to Ziti Network?

1 Like

This screen shot made me wonder - what happens when you use a fully qualified domain name here. OpenZiti does not provide support for 'hostnames' nor 'unqualified domain names'. First thing I'd recommend is to change the intercept address to anything with a period in it... testvideo.ziti, testvideo.go, test.video, anything along those lines where it has a period... this might just be a DNS resolution issue

image

Why does the tunnel want to connect to the Private Edge router?

I think @dariuszSki has this one - I expect your "private edge router" has an attribute on it which puts it into a policy that allows any client to connect to it. That's not a terribly uncommon thing to see and other than noise in the logs, it should be fine.

Right now it's still not 100% clear to me where your problem is. It might just be the unqualfified DNS name issue.

I'd really like to be able to make a common lab for both of us to setup. That way I can understand exactly what's happening. You'd acknowledged that what you're doing is similarish to the docker-compose environment, would it be ok if we were to use that?

Right now, with this new information, it seems like maybe you just need to get the logs from the mobile client and see what it's saying and try the fully qualified DNS name.

I noticed in the logs shown in your post that the orchestrator router may be listening on the name not ip. Is that right? i.e. orchestrator-edge-router@tls://orchestrator:3022. If so, is that name resolvable for the client?

I think I have solved my issue.


So, now, what was the issue? It seems that I had messed up on my Service Router Policies.
Initially, my policies were like this.

The logic behind this was that. I thought that the Service Edge Router Policies were used for attaching a Service to an Edge router. Like the Following Diagram.

But as you can see I left my Public edge router outside of the policies. Why? well because I thought that this Public edge router is not offloading traffic to any service, (It is just used as an entry point into the network). What confused me more was the Edge Router Policies

Ignore the default policies, just focus on the ones that say #all. -> The allRouters policy is just affecting my Public Edge Router (Only one router has the attribute #public). I think I should start to specify which Identity can access specific routers for example my test-router-policy should only accept #videoclients (The role attribute of the identities that consume video) and the identity/role of the video server. (correct?).

Anyway. How I solved my issue? I just added one more Service Edge Router Policies like this.

This means that the publicRouter policy applies to my Public Edge Router and lets it provide every service from my network, (Even if it is not directly attached to it, right?). This is a mandatory requirement, correct?.

I thought that I had figured out how the policies worked, but I see that I still have a lot to learn.
Anyway. Should I turn this off on the Public Edge Router?

As it does not directly offload traffic to a Service, is it necessary?

Thank you very much.
P.S. I still haven't been able to resolve the certificate issue from my Fabric Router. Does the YAML configuration file contents affect the way the certificates are generated when we enroll the router? This is the only router that has this issue.

Blockquote

Again. thank you.

Generally speaking, most deployments want to allow #all their identities to access all the #public routers so that in the cases when the client is mobile or moves to another network things will still work. Edge Router policies declare which routers a given identity is allowed to attach to only. It has nothing to do with where the traffic exits and doesn't control if the identity is allowed to access the service. So when you talk about your identities have access to your test-router-policy only accepting #videoclients, I would actually think you want that to be in your dial service-policy, not your edge-router-policy - does that make sense? Your clients will try to attach to any edge router they are permitted to attach to via the edge-router-policy. Once attached the service-policy controls what services the clients can access.

You know, that's totally up to you. You might find a use for a centralized egress some day, having it tunneler enabled is useful. You don't need to use that identity - you control the services it can bind so if it's me? I'd just make all my routers edge routers with tunneler enabled = true. If you were giving other people access to your controller - maybe you wouldn't want to all this because it allows people to define services in that network it's defined in. Mostly though, I say leave it on. :slight_smile: To your point, since it doesn't yet offload traffic - it is not necessary.

Oh - it just occurred to me that it might be (probably is) because the other routers were not enrolled with the hostname/dns that your fabric router attaches to... In your router config files you'll find a "CSR" section. That section needs to be full, and complete, when you create/enroll the router. I bet that with your DNS setup, you didn't set it up with all the hostname/dns names in mind at first...

Open ALL your edge routers and check out the SANS section:

    sans:
      dns:
        - ip-172-31-42-64
        - localhost
        - ec2-18-188-201-183.us-east-2.compute.amazonaws.com
      ip:
        - "127.0.0.1"
        - "18.188.201.183"

Is the fabric router connecting to an entry that was in this file? If not, I expect your certificates that were created are not 100% complete. You should probably re-enroll all your routers with all the dns/ip entries.

Here's how this works - when you enroll this router, you specify the config file which has these settings in it. These dns/ip entries are put into a CSR and sent to the controller for fulfillment. The router then writes these certs out as part of the enrollment process, when it succeeds and these certs are used for mutual TLS. Your 'fabric router' needs to access the other routers at the dns/ip specified in there or the certs are not valid... I could go on and on here - but maybe that will explain it enough for you (hopefully you understand PKI a bit?). If not follow up and I can add more details

Quick afterthought… To be just a bit more specific, all the edge routers which are dialing another router to form a link will need to dial a router on the right address. If the other routers are dialing the fabric router - then just the fabric router will need to be recreated/reenrolled with the proper DNS entries. That make sense? The address the router advertises, which the other routers dial into, that address needs to be in the certificate that is presented when those routers connect… and that is done when the certificate is made - which is where the sans fields come in… hth

Yes, this was one of the issues also.
Thank you

Thank you very much.
I managed to solve that issue thanks to your info.

This community and the support team are amazing.

:slight_smile:

2 Likes

As a further note, there is a way to deploy a router in “fabric only” mode. It is not suggested as many convenience features will not work. The major one being that in “fabric only” mode a router will not roll its certificates forward. That has to be done manually by an operator.

As Clint said above, deploying Edge Routers as “private” (aka do not accept incoming Ziti SDK/Tunneler connections) is the suggested path.