Trouble shooting starting a remote public edge router

One thing I started to explore was the /etc/hosts file

I noticed that instance-20220416-1603 is connected to the private IP address

So… its coming from with the local subnet

Where as when I change this to the server IP address… its originating from outside the network

This is probably something related… but are still trying to understand the problem

I have checked that port 6262 is open on the VCN and the server firewall.

Is there anything else I can check?

I think what would help is to understand what certificate the edge router users to create the mutual tls connection.

For instance… is it the intermediate certificate… this is an area I am very uncertain about.

[ 0.188] FATAL ziti/ziti-router/subcmd.run: {error=[error connecting ctrl (x509: certificate is valid for 127.0.0.1, not 168.138.10.79)]} error starting

That's pretty clear to me. The router is trying to connect to the controller at an address that's not valid for the certificate the controller is presenting. Try to change the ctrl.endpoint in the router config to 168.138.10.79. What's the current value in there now? is it already this?

ctrl:
  endpoint:             tls:168.138.10.79:6262

I expect your current value in there is not this ip?

UGH - nevermind. I read the dang error wrong… The certificate is NOT valid for that IP address, my bad. You’ll need to regenerate the PKI for that controller… Let me see if i can get you the right commands…

1 Like

x509: certificate is valid for 127.0.0.1, not controller-ip-address means that you either did not create the new cert/key properly or on the controller your yaml is not pointing to the correct certificates or you did not restart the controller after updating.

the entries that need to be updated in controller yaml if you chose to name them differently are:

server_cert:{name}-server.cert

key: {name}-server.key

1 Like

I tried that unsuccessfully… so… I was watching the video you created to upset the router for Oracle Linux again.

When I re-created the edge router the first time with the IP address… I left the ctl tag unchanged.

So… if I change this to an IP address… do I need to delete the router again and re-create it.

I am thinking… maybe there is some type of caching happening.

Does this make any sense… as I do not really understand why the edge router needed to be re-created when I changed the host name to an ip address

I will try running these again now… will revert back shortly

ziti edge delete edge-router ${routerName}

ziti edge create edge-router ${routerName} -t -a "public" -o ${routerName}.jwt

ziti-router enroll  ${routerName}.yaml -j ${routerName}.jwt

I’d rather you just hold on for a bit… I’ll see if i can get you some new commands to try. I don’t think that will do anything for you. the issue is with the controllers cert or it’s with the other routers it’s trying to form a link to. let me work something up…

1 Like

Thanks for your help with this… I am finding this a very valuable learning experience.

Ok. These instructions will assume you have used the quickstart to setup the controller at first. If you haven't done that (I expect you did) then I think you should do that first and then run these steps... Assuming that you did use the quickstart...

setup the shell...

Make sure your shell is clean by logging out, then logging back in. If you need to source the environment file, do that:

source $HOME/.ziti/quickstart/$(hostname)/$(hostname).env

Then source the ziti-cli-scripts helper:

source /dev/stdin <<< "$(wget -qO- https://raw.githubusercontent.com/openziti/ziti/release-next/quickstart/docker/image/ziti-cli-functions.sh)"

setup variables

Now set these five variables in your shell. Replace ___FILL___ with the correct value (obviously) :slight_smile:

EDGE_CONTROLLER_EXTERNAL_DNS_NAME=___FILL___
EDGE_CONTROLLER_PRIVATE_DNS_NAME=___FILL___
EDGE_CONTROLLER_EXTERNAL_IP_ADDRESS=___FILL___
EDGE_CONTROLLER_PRIVATE_IP_ADDRESS=___FILL___
file_name="${ZITI_CONTROLLER_HOSTNAME}-$(date +'%Y-%m-%d_%H%M%S')"

Generate a new server certificate for your edge controller

This will make a new server cert using your existing PKI created when running the quickstart.

pki_allow_list_dns="${EDGE_CONTROLLER_EXTERNAL_DNS_NAME},${EDGE_CONTROLLER_PRIVATE_DNS_NAME},localhost,$(hostname)"
pki_allow_list_ip="127.0.0.1,${EDGE_CONTROLLER_EXTERNAL_IP_ADDRESS},${EDGE_CONTROLLER_PRIVATE_IP_ADDRESS}"

"${ZITI_BIN_DIR}/ziti" pki create server \
  --pki-root="${ZITI_PKI_OS_SPECIFIC}" \
  --ca-name ${ZITI_CONTROLLER_INTERMEDIATE_NAME} \
  --server-file "${file_name}-server" \
  --dns "${pki_allow_list_dns}" --ip "${pki_allow_list_ip}" \
  --server-name "${file_name} server certificate"

Find the new .pem file

cat <<HERE

    NEW SERVER CERTIFICATE GENERATED
    USE THIS FILE: $(find $ZITI_HOME -name "*${file_name}*chain.pem")

HERE

Use it, update controller config file

vi $ZITI_HOME/$(hostname).yaml

find the web.name.identity section. change the server_cert that is there and replace it with the "chain.pem" file. it should look something like mine:

web:
  - name: client-management
    bindPoints:
      - interface: 0.0.0.0:8441
        address: ec2-18-188-201-183.us-east-2.compute.amazonaws.com:8441
    identity:
      ca:       
      key:     
      server_cert: "/home/ubuntu/.ziti/quickstart/ip-172-31-42-64/pki/ip-172-31-42-64-intermediate/certs/ip-172-31-42-64-2022-08-02_123132-server.chain.pem"

VERIFY it's correct using openssl

Use openssl to 'connect' and print the certificates (replace localhost if you're not ON the controller):

openssl s_client -connect localhost:8441 -showcerts | openssl x509 -text

restart the controller

sudo systemctl restart ziti-controller

Wrapping up

That should give you what you need to make a new server cert from your existing PKI.

3 Likes

Super helpful. Thanks so much. I will work through this tomorrow. I want to spend some time to really understand what is all happening here. I was thinking over in my mind earlier today… yeah… I should probably start to better understand how all of the PKI works together.

Here’s what’s happening, from a high level…

  1. The router comes online.
  2. Router reads its ctrl.endpoint configuration entry, and tries to initiate a TLS connection to whatever is specified in the config file. The entry, clearly, must be accessible from the router. (You might recall that this was a quickstart bug from a few months ago)
  3. Router connects to the controller, the controller returns a certificate. The router checks that the certificate is valid. Router knows that it was trying to connect to “my.edge.controller” so if the certificate that comes back does not say “I am valid for my.edge.controller” the edge router just exits right there saying “hey, this is BAD, I’m not connecting to this endpoint”

So that’s “basically” what happened here.

Btw if you want to see what my controller returns you can run:

openssl s_client -connect ec2-18-188-201-183.us-east-2.compute.amazonaws.com:8441 -showcerts | openssl x509 -text

You’ll see in there my results as an example:

X509v3 Subject Alternative Name:
      DNS:ip-172-31-42-64, DNS:localhost, DNS:ip-172-31-42-64, DNS:ec2-18-188-201-183.us-east-2.compute.amazonaws.com, IP Address:127.0.0.1, IP Address:18.188.201.183
2 Likes

Totally awesome. I am very grateful for your time to share and pull this together. I wish I could make more likes :slight_smile:

A quick update… as I am still working through it… but now understand why I need to rebuild the certificate authority. Thanks for sending through the details as this will be very useful in the case you want to refresh all of your certificates. This will be something that I will be aiming to work on over the next month or so. It will require a lot of planning to make it a smooth transition.

@markamind - you should check out my follow-up over on the other post for more details. I think the instructions I provided above were not quite complete. See Creating certs for a remote private router - #13 by dovholuknf