Big Network’s Engineering Team is constantly improving elements of our global infrastructure. To obtain diversity across cloud providers, we have recently migrated some of our services to DigitalOcean. However, we were then challenged with how to network AWS VPCs out to DigitalOcean instances. We used our own Cloud Networks to solve the challenge.
In order to further improve reliability in parts of our infrastructure, we have opted to distribute some of our services to DigitalOcean instances. A portion of this decision was driven by a specific need to have world routable IPv4 public IP addresses present in the Linux networking space (offered by DigitalOcean) instead of AWS NAT-ed Elastic IP address (which offers 1:1 NAT, so it “feels” like a public IP, but it still isn’t). The decision to move some services from AWS to DigitalOcean was the easy part; we knew how we wanted our service delivery topology to change and why. The actual challenge became securely routing traffic from the said services back to our internal infrastructure without exposing it to the outside world and achieving a secure multi-cloud scenario. Luckily, we make technology just for that!
The topology diagram below explains how orchestration traffic to and from Edge Devices lands in Big Network’s Cloud Orchestrator, which is predominantly hosted in AWS. We use a dedicated set of Cloud Networks, named a “Management Network,” to securely carry traffic from an Edge Device to AWS. The instance at AWS that gateways traffic to our virtual private cloud, or VPC, is called a “gate” (shown in the diagram as “Cloud Network A”). Our goal was to place a redundant “gate” inside DigitalOcean for some benefits associated with visibility in the Linux networking stack and a real Public IP. It was trivial to run our “gate” code on a Digital Ocean instance, but we still needed to get that data back to our AWS VPC. Essentially, we needed to create a secure cloud-to-cloud “backhaul” network for this traffic (shown in the diagram as “Cloud Network B”).
We considered a variety of options to implement the “backhaul”. We knew we wanted to use a Cloud Network, but there were a variety of topologies to choose from. Below, we’ll walk you through what we considered and the pros and cons of each option.
The most simplified solution would introduce Layer 2 network communication between nodes by installing the Big Network Headless Linux agent on every server/cluster that should participate in this communication:
This would, however, require some service discovery/DNS adjustments on our part so that applications are aware that they should communicate via tunnel-assigned addresses. This is not always doable or the most elegant solution if, for example, your control-plane resides mainly in containers and you don't want to ship Big Network as a sidecar container (which, BTW, you can!).
With the assumption that only control-plane holds stateful services, we can move cloud termination on the AWS side to a NAT instance and connect whole private networks or parts of them. This solution would not require any change to our applications and is just a matter of routing traffic correctly via NAT instance. We can still access services in control-plane via their original DNS names or loadbalancers:
Layer 2 network connection is persisted between DigitalOcean droplets and the AWS NAT instance. However, network connection inside the virtual private cloud needs to be Layer 3 (L3) as AWS does not allow Layer 2 traffic to flow freely inside VPCs.
A more complicated setup would be to terminate the tunnel on both sides via NAT instances, thus achieving full L3 private network-to-private network connection. This setup would additionally free all servers from having Big Network Headless Linux agent installed on them. The downside of this solution is that manual routes have to be configured in DigitalOcean droplets as there is no concept of Route Table resources like there is with AWS VPC.
In our discussion of trade offs and approaches, we decided to use the "Agents to NAT Instance" methodology. This is a complex, technical implementation, so we decided to show you exactly how we implemented it:
A few requirements must first be met:
Let's assume for this scenario that the VPC has CIDR range 172.16.0.0/16 and that our Cloud Network will have 192.168.254.0/24.
The first step is to prepare the Cloud Network that will tunnel our traffic between clouds. Make sure to note Network ID here:
Then a proper static route needs to be in place. This route will get propagated to all joined DO droplets via our agent software. This provides the DO droplets with knowledge of where our resources in the AWS VPC reside. In your settings, this can be found at Advanced Settings → IP routes:
Later, we will assign the IP address used as the Route via (IP Next Hop) to our AWS NAT instance as a static IP address.
We can now install Big Network agent on AWS NAT instance and DigitalOcean droplets, then join both to the Cloud Network:
echo 'deb [signed-by=/usr/share/keyrings/bignetwork.asc] https://repo.bignetwork.com/ubuntu jammy main' > /etc/apt/sources.list.d/bignetwork.listwget -O /usr/share/keyrings/bignetwork.asc https://repo.bignetwork.com/ubuntu/pub.keyapt updateapt install bnbn-cli join d363b4e9bd3d6aa8 # your NetworkID here, this one does not existbn-cli status
Output of the last command should be similar to this:
200 info 9788637428 1.10.2 ONLINE
Make sure to remember Node ID (3rd column) for the NAT instance as we want to configure it with the static IP address in the next step.
Next, we need to navigate back to the Portal. Mark Allow for all new nodes and type in static IP for the NAT instance:
Then navigate to the top right corner → Pending changes (gear wheel icon) and Apply changes:
Now the status on both nodes should show “OK” when running bn-cli listnetworks. DigitalOcean droplet will have DHCP-like assigned IP address and our NAT instance will show static IP we have configured:
# AWS NAT Instance200 listnetworks <nwid><name><mac><status><type><dev><BN assigned ips>200 listnetworks d363b4e9bd3d6aa8 AWS-DO-tunnel aa:fd:b5:de:9d:9c OK PRIVATE bnhlpitayv 192.168.254.11/24# DigitalOcean droplet200 listnetworks <nwid><name><mac><status><type><dev><BN assigned ips>200 listnetworks d363b4e9bd3d6aa8 AWS-DO-tunnel aa:5c:fc:7b:66:79 OK PRIVATE bnhlpitayv 192.168.254.170/24
At this moment you should already see route being propagated onto the network. The NAT instance will not receive it because it is assigned IP address that is mentioned as a next hop in that route, but the DigitalOcean droplet will show it in ip route output:
172.16.0.0/16 via 192.168.254.11 dev bnhlpitayv proto static metric 5000
Next, write down the name of the network interface associated with Cloud Network. In this example it is the bnhlpitayv, but yours will differ.
This is almost identical to VPC NAT Instance guide by AWS. As a matter of fact, we also utilize this instance for granting internet connection to instances in our private Subnet instead of relying on managed NAT gateway. This tends to reduce some costs on AWS.
Only difference from the guide above are these iptables rules:
# Everything from AWS VPC destined to Cloud Network should leave via BigNetwork interfaceiptables -A FORWARD -i eth0 -o bnhlpitayv -d 192.168.254.0/24 -j ACCEPTiptables -t nat -A POSTROUTING -o bnhlpitayv -d 192.168.254.0/24 -j ACCEPT# Everything from Cloud Network destined to AWS VPC should go via standard interfaceiptables -A FORWARD -i bnhlpitayv -o eth0 -d 172.16.0.0/16 -j ACCEPTiptables -t nat -A POSTROUTING -o eth0 -d 172.16.0.0/16 -j ACCEPT# Lastly, act as NAT gateway for instances in private subnetsiptables -t nat -A POSTROUTING -o eth0 -d 0.0.0.0/0 -j MASQUERADEsysctl -w net.ipv4.ip_forward=1apt install iptables-persistent netfilter-persistentiptables-save > /etc/iptables/rules.v4systemctl enable netfilter-persistent
An AWS VPC consults Route Table resources within VPC when instances try to communicate over the network, so we need to create new or modify existing Route Table similar to this:
It is essential to assign this newly created Route Table to the private Subnet within AWS VPC, otherwise instances within that subnet would not be able to utilize our NAT instance.
We can go ahead in this step and prepare an empty test instance in AWS private Subnet or use any already present. At this point we should be able to ping our demo DigitalOcean droplet:
~$ ping 192.168.254.170PING 192.168.254.170 (192.168.254.170) 56(84) bytes of data.64 bytes from 192.168.254.170: icmp_seq=1 ttl=63 time=17.4 ms64 bytes from 192.168.254.170: icmp_seq=2 ttl=63 time=7.86 ms64 bytes from 192.168.254.170: icmp_seq=3 ttl=63 time=7.82 ms
And vice-versa ping our hidden AWS instance in private subnet from DigitalOcean droplet:
~$ ping 172.16.10.105PING 172.16.10.105 (172.16.10.105) 56(84) bytes of data.64 bytes from 172.16.10.105: icmp_seq=1 ttl=63 time=7.85 ms64 bytes from 172.16.10.105: icmp_seq=2 ttl=63 time=12.0 ms64 bytes from 172.16.10.105: icmp_seq=3 ttl=63 time=8.52 ms
In this post, we demonstrate the flexibility of Cloud Networks to connect AWS virtual private cloud (VPCs) to remote DigitalOcean Droplets. The Engineering Team at Big Network is successfully using this configuration to “backhaul” traffic from Digital Ocean droplets to our AWS VPC environment.