How Atomic Access used open networking to scale fast

The author, Joe Botha

South African internet service provider Atomic Access operates a fully open network that can scale to billions of packets per second running Debian GNU/Linux. Not so long ago, one couldn’t dream of running fully open-source software on 100Gbit/s routers, but an open networking strategy has helped the company become a top-rated fibre ISP.

Since 2014, content delivery networks have moved content very close to the end user with latency as low as 1ms. This is what makes on-demand video streaming business models like Netflix work. Fibre-to-the-home services started rolling out around the same time and all of this meant rapidly increasing traffic volumes.

The result is a need for modern ISP networks to provide a thin and efficient layer between the content networks and their fibre customer. Central to this is high-speed routing, where packets pass through a single router connecting the content network to the fibre network.

Atomic launched in 2018 using Debian with Intel Xeon CPUs and 10Gbit/s network cards. This was fully open source and worked well for a while, but it was all software-based routing. As customer numbers grew, Atomic needed to scale traffic volumes and route at hardware speeds.

Hardware routing with Asics (application-specific integrated circuits) is orders of magnitude faster than software routing with general-purpose CPUs. A fairly modern Linux server can route about 8Gbit/s of traffic before it starts running into software interrupt limitations. Compare that to modern Asics that can route at 50Tbit/s – gigabit vs terabit speeds, or a thousand times faster.

It would be very convenient if x86 CPUs could route billions of packets per second with consistent latency, but that’s simply not realistic, and at some point you need to start using Asics. Asic routing also delivers more consistent latency.

It’s a big leap moving from software to hardware routing, a leap in cost and operational thinking. Having a software forwarding plane in the operating system is different to having an isolated and specialised hardware forwarding plane which drives the front panel ports of the router.

Open networking

We could have chosen to buy routers from the usual big-name companies such as Cisco, Arista or Juniper, but because they force you into expensive long-term support contracts, we struggled to buy into their ecosystems.

Even with the expensive support contracts, networking vendors can be slow to resolve software bugs. You can’t have customers offline while the router vendor takes weeks to fix a software bug and create a new operating system release. You don’t want to be dependent on a single vendor, their troubleshooting and their process for updating software. This is often referred to as single vendor lock-in.

To get around the single vendor problem, ISPs often have a multi-vendor strategy, buying routers from multiple vendors. This can get very expensive, and you end up having two software ecosystems to manage. The result is that only bigger ISPs have hardware speed routing and there is usually no easy, gradual or cost-effective upgrade path between software routing and hardware routing.

Read: Internet shutdowns, and how governments do it

Many bigger-brand ISPs outsource their networks because the equipment is expensive and complex to operate. We estimate around 45% of local fibre ISP customers are on outsourced networks. Some ISPs just want to be marketing and billing companies with support chatbots. The ISP staff often have no insight into the outsourced IP network, which is concerning when this is the primary function the customer is paying for.

Looking at the NAPAfrica peering point data, about 50% of local ISPs are using software routing and about 50% are using big-name vendor hardware routing. Of the ISPs using the big-name vendors, we suspect at least 60% are using refurbished equipment to avoid the expensive support contracts. A small percentage will have active support contracts and a multi-vendor strategy.

ISPs basically have three options:

  • Buy new router hardware, with expensive support contracts, ideally from more than one vendor;
  • Buy refurbished router hardware, without support contracts and live with the risks; or
  • Buy inexpensive routers which do software routing.

In all these cases you are locked into proprietary software and varying levels of software instability. Imagine being a coder without a community of developers on Stack Overflow. Imagine not being able to look at the source code or logs – and paying a lot of money to be in this position.

Atomic chose to do things differently. We believe in open networking, open source and open standards. We build and operate our own network and we want full operational control and visibility. We chose to go looking for a purist open networking solution.

Around 2019, we started following the open networking trend and experimented with a few devices, but they often had closed software development kits and a “black box” Asic management app in userspace. You could buy commercial third-party network operating systems to run on these devices, but they were closed source and not very reliable.

Read: Webafrica to buy Mweb from Dimension Data

Then we discovered a fully open option with Mellanox Spectrum Asics and the Switchdev project. Switchdev and the mlxsw driver are in the standard Linux kernel. We started using two Mellanox SN2010 switches. They had good port density for our needs, low power use and were compact enough to have two switches in one RU (server rack unit).

It took the Atomic network engineers some time to get everything working, compiling our own kernel and building Debian packages for the various management apps, but we ended up with a flexible and modular solution that is fully open source so we can swap out software as needed. It also means Atomic has ownership of all parts and the improved visibility that comes from being able to see under the bonnet, so to speak.

The Nvidia SN2000 series devices we use are often used in high-frequency trading networks which require ultra-low and consistent latency

We’ve proven that it is indeed possible to route at 100Gbit/s speeds with just Linux. Our peering and content delivery can now scale to billions of packets per second.

Open networking gives us hardware speed routing with a familiar management interface, without the vendor lock-in and crazy high costs. Atomic’s business and home customers now get Asic speeds and consistent low latency routing. The network is fully dual-stack with IPv6 services available on all our fibre networks. We now see 50% less latency for packets crossing the Cape Town peering points with 80% less jitter.

The Nvidia SN2000 series devices we use are often used in high-frequency trading networks which require ultra-low and consistent latency – 300 nanosecond latency is about as low as you get for a full featured IP routing switch.

Fibre operators make switching ISPs difficult: Ispa

It took 25 years between the initial Linux release and the Switchdev Spectrum driver going into the kernel (1991-2016). Linux nerds are purist, and this is as open and purist as high-speed networking gets.

Get breaking news alerts from TechCentral on WhatsApp

Source: techcentral.co.za