Architecture Guidance for Non-Standard App/Emulate LAN

bearrito · August 12, 2023, 4:07am

This is a follow-up to Support for ROS2.0?.
I think Open-Ziti can solve this AND if so would be a Killer APP $$$

The above question was concerning ROS (Robot Operating System). It is the more or less defacto framework for Robotics. Despite the name it is Ubuntu and not a custom OS.

The network architecture of ROS has always been challenging.

Some background

It was designed initially to all run on a LAN.
There is a privileged process called the Master. It is a coordinator of other processes as well as parameter store. So it’s like a combination Message Queue/Peer-2-Peer tracker and Consul.
ROS nodes launch and contact the master. The master stores IP information as well as what topics each Node is publishing/subscribing.
If Node A says I want to to subscribe to topic LidarData the Master will send the IP of the nodes that are publishing LidarData, say Node B
Node A will then contact B and establish a session. Node A can contact B and vice versa.
All ports are ephemeral, except Master port which is known/fixed.
Many use cases will involve some number of Nodes running on a LAN, and some number remote.
Many times those remote nodes will be Robotics Engineers troubleshooting/diagnosing.

Here are the specifics of my case and what I’m trying to solve. I have a bunch of nodes running in a LAN/Subnet on AWS. They are running robotic simulation tasks (SLAM if you are familiar)

I think communication between them should be straightforward to address with intercept/host configs.

I don’t have a good mental model of what I need to do as remote human user. I will be running a node on my Ubuntu workstation. My IP will change based on my local network.

Is there a combination of intercept/host and dial/bind I can do to achieve this securely?

Thanks if you read this far and have input!

TheLumberjack · August 14, 2023, 5:33pm

Yes, I'd agree that all of this sounds pretty straightforward, assuming I'm understanding your description correctly.

Is this "pre-configured" or is this discovered? This is the only thing I read that gives me immediate cause for worry. I assume it's configured somehow but maybe it's done via some other network protocol? If it's TCP/UDP, it should be fine, I think we could figure out a way to make it work. If it's something else, it might be more tricky (to maybe impossible depending on what/how it actually works). This one question gives me the most concern to be honest as I just don't know anything about operating a ROS network...

The next question I'll have, do you want "NodeA" to be able to subscribe and connect to ONLY the nodes that publish LidarData? If so, I am feeling like you'll have to automate some of the overlay network based on the ROS master. If you are fine with "any node talking to any other node" type of security, that'll be easier to accommodate. (Not sure if that's clear or not, hope it is?)

That's fine, you'll just have to allow for the port range in your OpenZiti configuration.

Do you want the IP ranges to overlap or can the "remote" nodes be a different IP range? If it were me, I wouldn't even differentiate between "remote" vs "lan" and I'd let that be just a happy coincidence that some services are on the same LAN. I'd still treat every single node as 'remote' and only accessible if running the OpenZiti overlay (but you might have valid reasons that won't work for ya).

addressing the remainder of your text

I don't have a great mental model for how ROS works either so it might take us some back and forth to get to an answer here. I can outline what the process would be, with my current understanding of the situation and we can go from there... Maybe we'll figure it out together? (hopefully)

That's 100% fine presuming you understand what your IP is locally and can put the ROS nodes onto any other subnet that won't conflict with your actual underlay network.

what I would do

Assuming I understand enough about the problem, here's the sorts of things I would do....

assume every node is remote

For starters, I'd just treat every single node as though it were remote even if they aren't actually remote to you or to the collector etc. It simplifies the mental model needed (imo). It's not strictly necessary, but I think for starting out it will help

install a tunneler on every node

I would then install a ziti-edge-tunnel on every node and I would probably create the identity with a name that looks like an ip address just to make things easier on me (I am assuming you can't use DNS, and want/need to use IP). I would assign the IP into a reserverd range that you are certain (or at least reasonably certain) won't overlap your actual assigned IP to your actual underlay network. something from Reserved IP addresses - Wikipedia. Me, I'd use something in the old "class b" range of 172.16.0.0/12 just because I rarely found these IPs used in my own travels. But you can choose whatever IP range suits you. You could even shadow actual, valid IPs, I just don't recommend doing that until you're more comfortable troubleshooting when things go wrong cause that can get hairy imo.

I would then simply assign IPs to every single node of the ROS network and I would address the nodes only with the "overlay IP" (the one in the 172.16.0.0/12 you assigned). That way the only way to access the nodes is by using the OpenZiti overlay network.

Then, from your actual computer, regardless of the actual underlay address you have you could access the master node using "172.16.0.1" and "node2" (i skipped 'node1' on purpose) at "172.16.0.2" and Node3 at "172.16.0.3" and so on and so forth...

That make sense? That help? I hope so anyway

bearrito · August 15, 2023, 4:10pm

@TheLumberjack
This is what I have working.

Let me layout what I have working along with some simplifying assumptions that will allow us to get rid of some of the ROS specific knowledge.

The term Node is overloaded. ROS calls processes nodes, as they are nodes in the ROS computational graph, but ultimately they are regular processes. To help simplify I'll only refer to Machines and Processes from here out.

I have clients. These correspond to user workstations. I don't control these, as these correspond to engineer machines.
In this case, I have many machines running in the cloud. These are Masters. They run a special process called a Core. This Core process always listens at 11311.
Client need to initiate connection to a single master at a time. Clients never need to communicate with other clients.
Masters never initiate the initial connection, but could subsequent. see below. Masters never need to communicate with other masters.
There is no notion of internal ROS ACL's. Meaning if a client can contact a Core process. It can talk to all other process.
The Client initiates the connection to the Master by dialing master:11311.
The Core process on the Master will record that a Client (Hostname) has joined the ROS logical computational network.
The master and client are now free to launch processes that communicate with each other over ephemeral ports. In clearer language. The Master might have a process called LidarPublisher publishing LidarData on a port in [32768-60999] say 40000. The client will say to the Core process on the Master. I would like to receive LidarData. The master responds with a uri like master:40000. The client will initiate a connection to master:4000. This sets up a socket between client <-> master. The reverse is also true. The client could have a process called SimulationTaskPublisher. There could be a process on the Master called SimulationTaskRunner. It would ask the Core process, who is publishing SimultationTasks? The Core would respond: client:XYZ....

Following the Docker Compose Quickstart.

Add the following to Red Network in the docker-compose.yaml. This is standard ROS image. ROS users will be very familiar with this.

roscore-red:
image: ros:noetic-ros-core-focal
hostname: ros-test.lan
expose:
- 32768-60999
- 11311
networks:
zitired:
aliases:
- ros-test.lan
- ros-test-red
- ros-test.red
- ros.test.red
command: ['roscore']

Client to Master

ziti edge create identity user ros-client -a 'ros-clients' -o ros.client.jwt
ziti edge create config ros.intercept.v1 intercept.v1 '{"addresses": ["ros-test.red", "ros-test", "ros-test.lan"],"portRanges": [{"low": 32768,"high": 60999}, {"low": 11311,"high": 11311}],"protocols": ["tcp"]}'

ziti edge create config ros.host.v1 host.v1 '{"allowedAddresses": ["ros-test.red", "ros-test", "ros-test.lan"],"forwardAddress": true ,"forwardPort": true,"allowedPortRanges": [{"low": 32768,"high": 60999},{"low": 11311,"high": 11311}],"protocol": "tcp"}'

ziti edge create service ros.svc --configs ros.intercept.v1,ros.host.v1
ziti edge create service-policy ros.policy.dial Dial --service-roles "@ros.svc" --identity-roles '#ros-clients'
ziti edge create service-policy ros.policy.bind Bind --service-roles '@ros.svc' --identity-roles "@ziti-private-red"

Master to Client

ziti edge create config rviz.intercept.v1 intercept.v1 '{"addresses": ["*"], "forwardAddress": true,"portRanges": [{"low": 32768,"high": 60999}, {"low": 11311,"high": 11311}],"protocols": ["tcp"]}'

ziti edge create config rviz.host.v1 host.v1 '{"allowedAddresses": ["*"],"forwardPort": true, "forwardAddress": true,"allowedPortRanges": [{"low": 32768,"high": 60999},{"low": 11311,"high": 11311}],"protocol": "tcp"}'

ziti edge create service rviz.svc --configs ros.intercept.v1,ros.host.v1

ziti edge create service-policy rviz.policy.dial Dial --service-roles "@rviz.svc" --identity-roles '#ros-master'

ziti edge create service-policy rviz.policy.bind Bind --service-roles '@rviz.svc' --identity-roles '#ros-clients'

Add #ros-master to red router attributes in ui/cli

Open Ziti Questions

This oddly seems to work. The Client to Master flow makes sense to me.
The Master to Client Flow does not.

I think what I don't understand is: How do Edge routers acting as Tunnelers take on identity? How is the rviz.policy.dial and ros.policy.bind config working?

Example

For folks who are familiar with ROS, here is a example

locus@master:/$ rosnode machine # This list machines that are part of the graph
client.lan
master.lan
locus@master:/$ rostopic pub /open_ziti_bridge std_msgs/String "data: 'Hello'"
publishing and latching message. Press ctrl-C to terminate

Then on the client side

rostopic echo /open_ziti_bridge
data: "Hello"

If you can do the above, you have a working network.

I will say I'm still having some issues with DNS, so I'm having to poke at /etc/hosts and set ROS_HOSTNAME and ROS_MASTER_URI

TheLumberjack · August 16, 2023, 3:05pm

I’m just dropping a note here to let you know I read this and will get to it, but it’s gonna take me a while is all… This feels like it’ll be a bigger lift to help out with, but I will get to it (if nobody else does)

plorenz · August 16, 2023, 4:11pm

I think what I don’t understand is: How do Edge routers acting as Tunnelers take on identity? How is the rviz.policy.dial and ros.policy.bind config working?

I can answer the question of how Edge Router/Tunnelers (ER/T) and Identities interact.

When you create an edge router with tunneling capability, or after creation add that capability, an identity with the same id and name as the router is created, with an identity type of Router. This identity cannot be deleted and the name will stay in sync with the router. The identity represents the router in policies. So the ER/T will have access to intercept or host services based on which services the identity can dial or bind.
The controller will also create an edge router policy, which can't be updated or deleted, which gives the router identity access to the router, since it doesn't make sense that the router wouldn't have access to itself.
When the router is deleted or the tunneling capability is removed, the identity and edge router policy will also be removed.

Let me know if that clarifies things.
Paul

gberl002 · August 17, 2023, 1:05pm

Hey @bearrito, I’m taking a look at the example you provided so I can try to better understand your scenario. Thanks for the detailed setup instructions, I’ll report back with anything of note.

qrkourier · August 22, 2023, 9:49pm

The Master to Client Flow does not.

I see the client-to-master topic open example. Is it a similar procedure for the Master to open a topic published by a Client?

The thing that stands out to me about the master-to-client intercept config is the "addresses": ["*"] part. I believe you want the Master to use Ziti for all comms with the Client, and to accomplish that it will be necessary to specify either domain name(s), domain namespace(s), IPv4 address(es), or IPv4 range(s).

Did I understand correctly that the Client reports its own IPv4 address to the Master upon startup, and that is the address Core will report for available topics hosted by the Client?

If so, then we need to ensure the address reported to the Master by the Client matches the intercept config's address(es).

qrkourier · August 24, 2023, 5:40pm

@bearrito We can hand-craft a Ziti service intercept address for a client-hosted topic to test the master-to-client direction, and for an autonomous solution we would need some kind of event or hook to trigger creating the Ziti service.

For example, when a Master asks for a client-hosted topic we have the information we need to perform the Ziti management API operations to create that overlay path: intercept address, hosting client node.

Topic		Replies	Views
Support for ROS2.0? Tunneler Apps	5	309	August 12, 2023
General Architecture Questions General Questions	18	339	March 10, 2024
Basic NextCloud configuration example with self-hosted Ziti controller Ziti Overlay	5	566	July 1, 2023
Any plans to implement p2p using STUN/TURN Ziti Overlay	4	128	January 6, 2025
Ziti overlay network and reverse proxy public DNS redirection Ziti Overlay	10	969	June 26, 2023