Migrate from NetFoundry to Oracle cloud

With the imminent shutdown of the free level at NetFoundry I'm recreating my little environment on an Oracle cloud instance. Have the server spun up, followed the Linux deployment guides to get the controller, router and console going - sorted out firewall settings for Oracle and the Ubuntu instance and DNS. Enrolled my laptop and raspberry pi using the existing tunnels. After creating my first service (web page on pi), I can't access from laptop. Just get "connection refused".
I do see in the console that the edge router (named EdgeRouter) is online and linked to the identity EdgeRouter. The identity is showing offline though. The console is just different enough from the NetFoundry one that I can't tell if this is normal.
The deployment guide I followed only mentioned ports 1280 and 3022 (I used defaults). The guide for host anywhere also mentions ports 8440-3, with the last for the console. The console is working without 8443 being opened. I opened 8440-2, but no change.
Not sure where to go from here in troubleshooting. Any guidance would be appreciated.

Sounds like your laptop is returning "connection refused"? Can you confirm what the address that is advertised on the edge router is set to? Specifically in the 'edge' section? Should look like this:

listeners:
# bindings of edge and tunnel requires an "edge" section below
  - binding: edge
    address: tls:0.0.0.0:8442
    options:
      advertise: ec2-3-18-113-172.us-east-2.compute.amazonaws.com:8442

Assuming that value is correct for your environment, the next thing to do is to ensure the port is open using openssl s_client -connect (any response indicates success):

openssl s_client -connect ec2-3-18-113-172.us-east-2.compute.amazonaws.com:8442 </dev/null

If that's all good, then the next step is to probably send more of the client logs over for inspection... My guess is that the address is somehow incorrect?

This is from the router config.yml on the oracle server.

listeners:

Is that DNS entry valid? For me, I cannot resolve it, nor can other online tools like DNS Checker - DNS Check Propagation Tool

It should resolve. For example: DNS Checker - DNS Check Propagation Tool

Yes, it is the laptop that is returning the error. Both the laptop and pi are showing online in the console.

andrew@ThinkPad-T480:~$ openssl s_client -connect openziti.oci.andrewscomputersolutions.com:3022 </dev/null
CONNECTED(00000003)
depth=0 C = US, ST = NC, L = Charlotte, O = NetFoundry, OU = Ziti, CN = 1hBvFCwNCv
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 C = US, ST = NC, L = Charlotte, O = NetFoundry, OU = Ziti, CN = 1hBvFCwNCv
verify error:num=21:unable to verify the first certificate
verify return:1
depth=0 C = US, ST = NC, L = Charlotte, O = NetFoundry, OU = Ziti, CN = 1hBvFCwNCv
verify return:1


Certificate chain
0 s:C = US, ST = NC, L = Charlotte, O = NetFoundry, OU = Ziti, CN = 1hBvFCwNCv
i:C = US, L = Charlotte, O = NetFoundry, OU = ADV-DEV, CN = NetFoundry Inc. Intermediate CA ZOZrJazca
a:PKEY: rsaEncryption, 4096 (bit); sigalg: RSA-SHA256
v:NotBefore: May 28 05:22:22 2024 GMT; NotAfter: May 28 05:23:22 2025 GMT


Oh well good! :slight_smile: That clears that up then. Does the pi also show 'connection refused' when you try to access that service? Now I suspect the offload of the service is incorrect. Maybe to a bad port or different host?

The DNS entry is new so hasn't propogated yet. I'm using hosts file entries in the interim.
On the pi, the openssl command returns the same output. No desktop on the pi, but wget to the webpage returns "connection refused" as well.

How'd you make the service? Is whatever service is behind the openziti service online? Is it reachable from the oracle edge router? Can you ssh to the edge router and then curl/wget to whatever that openziti service is trying to connect to?

I created a simple service in the console. Hosted by raspi1, intercept openhab.ziti2:80, served by localhost:8080. I just recreated the service and realized I hadn't included EdgeRouter in the bind policy - didn't help.

ubuntu@openziti:/var/lib$ wget http://openhab.ziti2/page/page_9e234669dd --2024-05-28 09:47:12-- http://openhab.ziti2/page/page_9e234669dd Resolving openhab.ziti2 (openhab.ziti2)... failed: Name or service not known. wget: unable to resolve host address ‘openhab.ziti2’
Should the EdgeRouter identity show as online? The Edge Router EdgeRouter is showing online and linked to the identity EdgeRouter.

as an aside, what markdown gives the output that you used in your first reply?

I would surely expect it to be online. The last ereror you shared makes it look like the tunneler isn't setup properly and/or doesn't have access to the service.

failed: Name or service not known. wget: unable to resolve host address ‘openhab.ziti2’

that seems like the intercept didn't work properly. That's either because the tunneler is not setup right or (more likely since you have recreated the service) the policy is not valid any longer. I am guessing you made your policy using @ references? If that's the case -- could you recreate the policy granting the service access? I think that'll fix this current issue. :frowning: Sorry you're hitting all the bumps here...

Triple "ticks" is used in markdown to make a code blocktype thing:

```
like this
```

The tunnellers on the pi and laptop were existing from the NetFoundry network. I just enrolled identities from the new controller.

I deleted the service, policies and configurations before recreating. The bind and dial policies both grant access to the service. The bind policy grants access to raspi1, the dial policy grants access to laptop, raspi1 and EdgeRouter.

I'm wondering if the EdgeRouter identity being offline is the root problem, although it seems odd that the router EdgeRouter is online

Oh, I didn't realize you drew a distinction between the "edge router" and "edge router identity". I don't think the edge router identity ends up showing up online in the zac but I actually don't know that answer for sure though. I just never thought to look. Looking at a recent install, I see this:
image

That indicates it 'has an api session' (the first green dot) but is not 'connected to an edge router'... Which is kinda confusing ... but I think we can ignore that for now.

Let's look at the edge router logs. Are there any errors showing in there? I would expect there to be some sort of error/indication in there that would be helpful. If there's nothing in there, we'll keep following the breadcrumbs...

Oh -- and please recreate the dial policy. If you use an @ reference, you need to do that since you removed the service. That would have unlinked the service from the policy that was there before (or just update the policy).

My identity has no green dots.

May 28 04:02:50 openziti entrypoint.bash[3168]: [20368.121]   ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:3022]: {remote=[128.14.239.38:54614] error=[context deadline exceeded]} handshake failed
May 28 04:02:50 openziti entrypoint.bash[3168]: [20368.503]   ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:3022]: {error=[tls: first record does not look like a TLS handshake] remote=[128.14.239.38:54616]} handshake failed
May 28 04:02:51 openziti entrypoint.bash[3168]: [20368.884]   ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:3022]: {remote=[128.14.239.38:54626] error=[tls: first record does not look like a TLS handshake]} handshake failed
May 28 04:02:51 openziti entrypoint.bash[3168]: [20369.266]   ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:3022]: {remote=[128.14.239.38:54636] error=[tls: first record does not look like a TLS handshake]} handshake failed
May 28 08:37:54 openziti entrypoint.bash[3168]: [36871.974]   ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:3022]: {remote=[80.66.76.134:2547] error=[tls: first record does not look like a TLS handshake]} handshake failed
May 28 09:04:18 openziti entrypoint.bash[3168]: [38455.881]   ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:3022]: {remote=[172.104.210.105:41242] error=[context deadline exceeded]} handshake failed
May 28 09:26:28 openziti entrypoint.bash[3168]: [39785.865]   ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:3022]: {error=[tls: client didn't provide a certificate] remote=[208.114.128.19:57396]} handshake failed
May 28 09:34:13 openziti entrypoint.bash[3168]: [40251.019]   ERROR transport/v2/tls.(*sharedListener).processConn [tls:0.0.0.0:3022]: {error=[tls: client didn't provide a certificate] remote=[208.114.128.19:57368]} handshake failed

The 9:00 entries would be me with wgets, I think. I've tried http and https.

The 4:00 entries would be system generated with the laptop offline.

The dial policy was recreated as well. The console helpfully errors on creating the service when components already exist so you can see what you missed when deleting the old stuff.

Attempts to load the page through the service don't generate new entries in the log

Alas, none of those are useful for what we're debugging. Those are all just untrusted connections trying to be established. I don't think they're relevant. They might be the openssl probes or just random internet probes (i noticed the ip changes)...

Ok. So, circling back... I've personally never looked at the router identity's state because I've never needed to. I don't know that it's relevant nor irrelevant, but I know I've never needed to look at it to resolve an issue, so I'll assume it's not relevant at this time. (EDIT: Identities for edge routers with tunneling enabled sometimes show hasEdgeRouterConnection=false even though everything is OK · Issue #2007 · openziti/ziti · GitHub there was a bug fixed for this issue about the edge session being wrong)

Generally speaking, every time I have gotten "connection refused", it's because I've configured something wrong but it's hard to figure out sometimes.

Let's make a brand new service to a public http server -- we'll use this discourse as our example...

Make this service:

Then find your router identity and add #openziti-discourse-binders to it, and find one of your test identities and add #openziti-discourse-dialers to it. After doing that, you should be able to go to https://intercepted-openziti.discourse.group/ and see what i show below (they have a wildcard cert, so you won't get a TLS error in this example)

Can you get that far? :slight_smile:

Nope, still refused to connect. Should the hosting configuration hostname be intercepted-openziti.discourse.group or openziti.discourse.group. I'm trying to change the host config, but getting invalid parameters for a bunch of other stuff. Probably have to delete and recreate the whole service

It should look identical to what I posted UGH... My screen cap was wrong... :frowning:

On the left is this for the identities #openziti-discourse-dialers and this for the "how will the service be accessed" is intercepted-openziti.discourse.group. On the right (the hosting configuration) is #openziti-discourse-binders and openziti.discourse.group.

If that's not working, please send me the logs from the router and from the tunneler for me to review. I don't think I need/want the controller logs yet. There must be something wrong between those two, but by golly I would expect the logs to show it very clearly.

What tunneler are you using on the laptop?

You should see something like this in your tunneler logs:


[2024-05-28T20:39:40.801Z]   DEBUG tunnel-sdk:tunnel_udp.c:269 recv_udp() intercepted address[udp:100.64.0.2:53] client[udp:100.64.0.1:52752] service[ziti:dns-resolver]
[2024-05-28T20:39:40.801Z]    INFO tunnel-cbs:ziti_dns.c:509 format_resp() found record[100.64.0.27] for query[1:intercepted-openziti.discourse.group]
[2024-05-28T20:39:40.802Z]   DEBUG tunnel-sdk:tunnel_tcp.c:429 recv_tcp() intercepted address[tcp:100.64.0.27:443] client[tcp:100.64.0.1:53951] service[fronted-openziti-discourse]
[2024-05-28T20:39:40.802Z]   DEBUG tunnel-cbs:ziti_tunnel_cbs.c:349 ziti_sdk_c_dial() service[fronted-openziti-discourse] app_data_json[194]='{"connType":null,"dst_protocol":"tcp","dst_hostname":"intercepted-openziti.discourse.group","dst_ip":"100.64.0.27","dst_port":"443","src_protocol":"tcp","src_ip":"100.64.0.1","src_port":"53951"}'

Didn't work with openziti.discourse.group either. If the original was correct, I don't understand how that works.