How to add Redundancy to your Lightning Node

In this guide you will learn how to add redundancy to your lightning network node. Since we’re just talking about hardware, many of these principals also apply to the fields of server administration and IT generally.

Redundancy is not the same as backups.

Backups, redundancy and watchtowers, as well as good security and privacy practice are all part of a holistic approach to effectively manage your node.

This guide is only about redundancy.

A Lightning node needs to be online as much as possible. Significant downtime can result in fewer earnings from routing payments, temporary inability to view balance or send funds, and even loss of funds.

Here are the redundancy techniques covered:

Uninterruptible Power Supply (UPS)
Dual Uplink
RAID 1 Redundancy
ECC Memory
Bonus: a note on virtualization and clusters

You don’t have to do all these at once. Start with #1 and work your way down over time as you deem necessary.

A good rule of thumb: the capacity of your channels should never be more valuable than 100x the cost of your node’s hardware.

For example, a plain Raspberry Pi kit costs $300 USD. It should be upgraded when the capacity of all its channels reaches $30,000 USD. Assuming you have generally reliable internet and power already.

Uninterruptible Power Supply (UPS)

Power outages will cause downtime and, in rare cases, file-system corruption – causing you to lose data. This can result in loss of funds if your node attempts to broadcast an old state to a channel peer.

A UPS is essentially a surge protector with a girthy backup battery built in. If the main power goes out, the UPS will switch to battery power and keep devices running for a bit longer.

Even if the power outage lasts longer than the battery can supply, at least you have a chance to gracefully shut down your node to avoid file corruption.

Here’s Amazon’s Choice for a UPS

Remember to also connect your router to the UPS so you don’t lose network connection when the power goes out!

Speaking of networks…

Dual Uplink

We’ve all experienced internet outages. If your node goes offline for too long, it will be unable to respond to malicious channel peers that may try to cheat you. A better solution to this problem is offered as a service by lightning network watchtowers. However, watchtowers are not a form of redundancy.

Many people pay for multiple internet connections from different ISPs because they make use of them to add redundancy and increased bandwidth to their home internet connection.

Even if you only have one option for an ISP, you can probably use a 4G cellular connection or satellite-based uplink like Starlink.

A dual uplink network switch can add redundancy to your network using two internet connections

Using a dual-uplink network switch, you can feed ethernet cables from two internet sources and load balance between them for your node.

RAID 1 Redundancy

Whether you’re using an old mechanical HDD or an SSD, the disk that stores your node’s data, as well as the entire blockchain will stop working eventually. Most disks are designed to fail in a read-only state. This means you can still recover data but can’t write new data. However, there’s always a risk of data loss, especially if you were cheap and bought a low quality SSD without a DRAM cache.

RAID stands for “Redundant Array of Independent Disks”. So actually this section’s title: “RAID 1 Redundancy” is itself redundant. There are many kinds of RAID that specify how the data is split across disks. RAID 1 is simply a standard duplication.

When using RAID 1, you have two storage disks of equal capacity plugged in and powered by your node. Using linux software called mdadm, you can create a virtual RAID disk using the two drives.

Use this guide from Digital Ocean to configure your software RAID: How to Create RAID Arrays with mdadm Afterwards, linux will recognize your disks as a single device, and you can mount it and read/write like normal. Now when one disk fails, your node will keep running as normal as long as the other disk is working properly.

Please note that the Raspberry Pi 4 is not powerful enough to power two drives at the same time. If you wish to add RAID to a Raspberry Pi node, you can try using Externally Powered USB 3.0 to SATA adapters.

An example of an externally powered USB to SATA adapter. You will need two of these to run RAID 1 on a Raspberry Pi

Perhaps at this stage, you should consider upgrading from a Raspberry Pi to a RockPi 4 or even a spare desktop PC or full blown server.

ECC Memory

The universe is a hostile place. While you’ve been reading, thousands of particles from deep space have been blasting straight through your body and your computer. Sometimes cosmic rays are the only valid explanation for why a computer system crashes.

If you’ve ever had an unexpected computer crash that never happened again, the odds are surprisingly high that it was a memory error caused by a cosmic particle. While regular computers use error correcting bits in software to catch most of these kinds of errors, ECC RAM has an extra layer of protection baked in at the hardware level.

ECC RAM is used in servers and systems where high availability is a must. ECC RAM only works on certain motherboards that support it. It’s also slower than non-ECC memory and it cost significantly more.

Bonus: A note on Virtualization and Clusters

A very popular way to add redundancy in the enterprise world is through distributed servers running a virtual application. This approach relies on advanced networking and software configuration to essentially run a single software program across many computers connected over the internet.

Amazon Web Services offers tools and services that make this easier.

If you’re interested, look into learning Kubernetes and Docker. You should also have advanced knowledge of the Lightning Network penalty mechanism. Many people have lost funds because they didn’t understand how LN punishes people who cannot keep an up to date state. This is something that can be especially tricky when using a distributed cluster.