A Step-by-step Guide to Building a Diskless Linux Cluster

Table of Contents

Intro

A diskless computer is a computer, as the name suggests, without disks. Typically it will grab the programs it needs to run over a network instead of a dedicated storage device. A diskless cluster consists of multiple diskless computers, often share the same hardware configuration, and a "disked" server computer that provides data needed by diskless ones over a network.

In this guide, we are going to set up a diskless cluster where all computers have x86-64 architecture and run Arch Linux. All client computers share the same root file system but each client has its own version of /etc directory so that they can have different configurations: one can run in headless mode while another boots into a desktop environment, for example.

Why Diskless Cluster?

As is said in this webpage, diskless clusters have these advantages:

Reduced cost due to many disks no longer needed.
Fewer disk failures because the number of disks is reduced.
Less power consumption, thus less heat and noise.
Configuration and management in a central place.

Let’s say if you want to place several IoT devices across your apartment, they all need to run Linux, and their file systems can’t fit into NOR flash chips. Providing each device an SD card or disk could be a pain in the butt:

They cost money.
They could fail, and replacing them costs more time and money.
The write performance of cheap SD cards could be poor.
SD cards are easy to lose.
Writing the OS image into SD cards one by one is time-consuming.
Syncing software versions across them is hard, especially when they are installed at different times, or some of them are put off-line for a period of time.
Making the same configuration change for each one of them requires extensive scripting and costs time.
Deploying programs requires uploading the programs first. Or you could write your program on the devices. But are you going to set up your development environment on every one of them?
And so on.

So unless you need some 100MB/s R/W speed on your computers, diskless is a better option. Our diskless cluster share the same root file system to make them more identical, and batch processing becomes easy as the data are all on the same machine.

How it Works?

After a client machine’s BIOS or UEFI finished executing, it will load and execute a small piece of code stored in the client machine’s network card. This program is called the Preboot eXecution Environment (PXE). PXE asks the server machine to allocate an IP address for it using the DHCP protocol. The server offers the client an IP address, and give it a file path. PXE will then accept the offered IP, contact the server with TFTP protocol and get a file using the file path.

The file is another larger piece of code called boot loader. You may have already known a boot loader called GRUB. PXE executes the boot loader, the boot loader first gets its configuration file from the TFTP server. The configuration file stores the path of the Linux kernel and initramfs. The boot loader then gets the kernel and initramfs through TFTP, loads them into RAM and executes the kernel with the command line options in the configuration file (imagine the kernel to be a command-line program you run in your terminal).

The kernel is a much larger piece of code. It uses the initramfs as a temporary root file system and runs the /init program in it (as a process instead of letting it take over the execution). /init then mounts a Network File System (NFS), provided by the server, to /new_root and mounts another NFS to /new_root/etc. Finally the kernel use /new_root as the root file system and run /init in it. /init will then start many background processes and show a login prompt to the user. This concludes the boot process of a diskless client machine.

These are the methods we use to let clients share the same root file system while having different configurations:

The root file system can only be mounted as read-only by clients. This is for keeping clients from garbling the root file system.
We create a tmpfs and put it on top of our read-only root to form an OverlayFS. By this, we make the root read-write so that programs won’t refuse to work due to not able to create a file. The changes are lost upon client reboot though.
Every client mounts its own NFS on /etc as read-write in the initramfs stage so that they can have different enabled systemd services, different SSH server keys, different user lists and/or group lists, etc. We use git to track changes in these /etc directories so that if you want to make a change across all clients, just commit the changes in the upstream /etc and do git-pulls in downstream /etc directories.
One designated client can have persistent write access to its /boot (thus all clients’ /boot) to make its own initramfs image in case that the hardware differ between the server and clients.

Let’s Do It!

Set up a DHCP/TFTP/DNS server with dnsmasq

Suppose the server machine has the static IP address 192.168.78.1/24 on its interface dedicated to this cluster. You may use a network manager to assign this static IP. Say if you use systemd-networkd, create /etc/systemd/network/00-eth0.network as follows should do the job:

[Match]
Name=eth0

[Link]
RequiredForOnline=no

[Network]
Address=192.168.78.1/24
ConfigureWithoutCarrier=yes

You must not have another DHCP server running in this network other than your server machine (or maybe you could, didn’t try that though).

Install dnsmasq package and edit /etc/dnsmasq.conf:

listen-address=192.168.78.1
dhcp-range=192.168.78.50,192.168.78.150,12h

# Change 8.8.8.8 to DNS servers of your choice
dhcp-option=option:dns-server,192.168.78.1,8.8.8.8

# Identify client machines by their MAC addresses.
# Assign each client a fixed host name and IP, and tag it with 'diskless'
dhcp-host=00:e0:66:59:89:84,client0,192.168.78.50,set:diskless
dhcp-host=00:e0:66:59:88:8b,client1,192.168.78.51,set:diskless
dhcp-host=00:e0:66:59:8c:50,client2,192.168.78.52,set:diskless
dhcp-host=00:e0:66:59:89:72,client3,192.168.78.53,set:diskless
dhcp-host=00:e0:66:59:8c:8d,client4,192.168.78.54,set:diskless
dhcp-host=00:e0:66:59:8a:2d,client5,192.168.78.55,set:diskless

enable-tftp
# We will create this folder later
tftp-root=/srv/root/boot

# Boot loader file path relative to tftp-root.
# Only clients tagged 'diskless' will be told this path.
dhcp-boot=tag:diskless,pxelinux.0

Add these lines to /etc/hosts to save yourself from typing client IPs every time. These entries are also used by dnsmasq and thus will be known by clients.

192.168.78.50   client0
192.168.78.51   client1
192.168.78.52   client2
192.168.78.53   client3
192.168.78.54   client4
192.168.78.55   client5

Enable and start dnsmasq.service:

# systemctl enable --now dnsmasq

Prepare the Root File System

On the server machine, create a directory as clients’ root file system:

# mkdir /srv/root

Create another directory as a mount point to bind mount /srv/root. This is for stopping pacstrap and arch-chroot from complaining /srv/root is not a mount point:

# mkdir /mnt/root
# mount --bind /srv/root /mnt/root

Add the following line to /etc/fstab to make this mount persistent across server reboots:

/srv/root   /mnt/root   none    bind

Now infect this file system with Arch Linux:

# pacstrap /mnt/root base linux linux-firmware syslinux sudo vi vim etckeeper

You can replace vi and vim with your favorite editor and add other packages such as intel-ucode. Our client system is still configurable after this so you don’t have to come up with a comprehensive package list here.

Change root into the new system:

# arch-chroot /mnt/root

Now set the time zone, localization, root password and other things described in Arch Wiki. But don’t touch /etc/fstab, /etc/hostname, /etc/hosts and initramfs for the time being.

In the chroot environment, install openssh and enable sshd.service if you need it:

chroot# pacman -S openssh
chroot# systemctl enable sshd

Create a user with sudo permission, and set its password. You can make this user’s UID identical to the user you use on the server machine, for more convenience later on.

chroot# useradd --create-home --groups wheel --uid 1001 wangruoxi
chroot# passwd wangruoxi

Edit /etc/sudoers with visudo command:

chroot# visudo

Create (chroot)/etc/systemd/network/00-wired.network:

[Match]
Name=eth0

[Network]
DHCP=yes
KeepConfiguration=yes
IgnoreCarrierLoss=yes

Enable systemd-networkd:

chroot# systemctl enable systemd-networkd

You may want to disable Predictable Network Interface Names:

chroot# mkdir -p /etc/udev/rules.d && ln -s /dev/null /etc/udev/rules.d/80-net-setup-link.rules

Use Ctrl+D to exit the chroot environment.

Now we export this file system as an NFS. Make sure nfs-utils is installed and add the following line to /etc/exports:

/srv/root    192.168.78.0/24(ro,no_root_squash,subtree_check)

Enable NFS server and re-export NFS shares:

# systemctl enable --now nfs-server
# exportfs -arv

Prepare the files needed in the client boot process

Boot loader executables

We installed syslinux in the pacstrap step because we need a boot loader provided by it. The boot loader is pxelinux. Copy some files used by pxelinux to /srv/root/boot folder as symlinks. Thes files will be transferred to client machines via TFTP during their boot process. We use symlinks here so that these files will be automatically updated whenever syslinux gets an update. These symlinks have relative target paths in case /srv/root needs to be moved to another location.

# cd /srv/root/boot
# cp -s ../usr/lib/syslinux/bios/{pxelinux.0,ldlinux.c32} ./

Boot loader configuration file

Create /srv/root/boot/pxelinux.cfg/default:

DEFAULT linux

LABEL linux
    KERNEL vmlinuz-linux
    APPEND root=/dev/nfs nfsroot=192.168.78.1:/srv/root,ro ip=dhcp
    INITRD intel-ucode.img,initramfs-linux-fallback.img

You can choose whether to use intel-ucode.img. You can also use amd-ucode.img according to client machines’ CPU manufacturer. Just make sure the related package is installed and the file exists in /srv/root/boot.

We use initramfs-linux-fallback.img for now because the non-fall-back initramfs generated by the server machine may not have kernel modules and firmwares required by client machines’ network cards.

initramfs

By default, Arch Linux initramfs don’t have functionalities to mount NFS. So we need mkinitcpio-nfs-utils to add them:

# arch-chroot /mnt/root pacman -S mkinitcpio-nfs-utils

There are another 2 packages we need: mkinitcpio-overlayfs and mkinitcpio-etc. They are in the Arch User Repository. Fetch their PKGBUILDs from AUR and build packages. Just copy this ‘for’ loop to your terminal and hit Enter.

#!/bin/bash

for pkg in mkinitcpio-overlayfs mkinitcpio-etc; do
    git clone https://aur.archlinux.org/$pkg.git
    cd $pkg
    makepkg
    sudo cp *.pkg.tar.xz /srv/root/root
    cd ..
done

Install them to the clients’ root file system:

# arch-chroot /mnt/root pacman -U /root/*.pkg.tar.xz

Edit /srv/root/etc/mkinitcpio.conf and modify the line begins with HOOKS=:

...
HOOKS=(base udev autodetect modconf net filesystems keyboard overlayfs etc)
...

Update initramfs images:

# arch-chroot /mnt/root mkinitcpio -p linux

Note: You should not replace the net hook with net_nfs4 provided by mkinitcpio-nfs4-hooks because, at the time of writing (Linux 5.5.13), OverlayFS don’t play along well with NFSv4’s ACL. (See this post)

Test if everything works

Now the server is configured to the point where clients can finish the boot process. Connect a client and the server to the same Ethernet switch. Before trying to boot a client machine, make sure PXE boot is supported and enabled in its BIOS.

If you aren’t able to hook a monitor to your client machine, you can do these thing to help to determine which boot stage it can reach.

If you installed openssh and enabled sshd.service in clients’ root, try to SSH into it. If the login is successful, we know the client machine boots successfully.
Run showmount on the server. If you can see the client’s IP address, the NFS is successfully mounted.
Ping the client’s IP address. If the client replies, its network configuration is successful.
Check dnsmasq‘s log with journalctl -eu dnsmasq. If you see sent /srv/root/boot/initramfs-linux-fallback.img to xx.xx.xx.xx in it, the client’s boot loader is executed correctly.

Give clients their own `/etc` directories

We installed etckeeper in the pacstrap step. It automatically tracks changes in /srv/root/etc with git, for example, when installing packages with pacman.

First do some initialization for etckeeper:

# arch-chroot /mnt/root
chroot# etckeeper init
chroot# cd /etc
chroot# git config user.name root
chroot# etckeeper commit -m "Initial commit"
chroot# exit

Clone the git repository in /srv/root/etc/.git for every client with this one-liner:

for i in {0..5}; do sudo git clone /srv/root/etc/.git /srv/etc-client$i; done

Grow our /etc/exports to be like this and re-export with sudo exportfs -arv:

/srv/root    192.168.78.0/24(ro,no_root_squash,subtree_check)
/srv/etc-client0    192.168.78.50(rw,no_root_squash,subtree_check)
/srv/etc-client1    192.168.78.51(rw,no_root_squash,subtree_check)
/srv/etc-client2    192.168.78.52(rw,no_root_squash,subtree_check)
/srv/etc-client3    192.168.78.53(rw,no_root_squash,subtree_check)
/srv/etc-client4    192.168.78.54(rw,no_root_squash,subtree_check)
/srv/etc-client5    192.168.78.55(rw,no_root_squash,subtree_check)

Modify /srv/root/boot/pxelinux.cfg/default so that clients will mount their own /etcs:

...
    APPEND root=/dev/nfs nfsroot=192.168.78.1:/srv/root,ro ip=dhcp etc=192.168.78.1:/srv/etc-HOSTNAME
...

Mounting /etc is performed by the initramfs hook provided by mkinitcpio-etc. It will replace HOSTNAME with the host-names each client received from dnsmasq.

Now connect all 6 clients to the Ethernet switch and boot them up. If everything goes well, they should finish the boot process and generate SSH host keys in their /etc directories. Run this command to commit SSH host keys to git:

for i in {0..5}; do
    sudo git -C /srv/etc-client$i add .
    sudo git -C /srv/etc-client$i commit -m "Add SSH host keys";
done

Grant one of the client write permission to `/boot`

As we already exported /srv/root as RO, exporting /srv/root/boot as R/W doesn’t give the client write permission somehow. So let’s create a directory to bind mount /srv/root/boot on it:

# mkdir /srv/clientboot
# mount --bind /srv/root/boot /srv/clientboot

Add this line to /etc/fstab to make this mount persistent:

/srv/root/boot  /srv/clientboot none    bind

Say if we are granting client0 write access. Export /srv/clientboot as R/W in /etc/exports and run sudo exportfs -arv:

...
/srv/clientboot 192.168.78.50(rw,no_root_squash,subtree_check)

Add this line to /srv/etc-client0/fstab:

192.168.78.1:/srv/clientboot    /boot   nfs     rw  0   0

And commit the change to git:

# git -C /srv/etc-client0 commit -a -m "Mount /boot"

If client0 is running, run this command to mount immediately:

(client0)# mount -t nfs 192.168.78.1:/srv/clientboot -o rw /boot

And generate initramfs images on client0:

(client0)# mkinitcpio -p linux

Now we can finally edit /srv/root/boot/pxelinux.cfg/default to use the non-fall-back image, which is much smaller in size:

...
    INITRD intel-ucode.img,initramfs-linux.img

Now turn on another client to see if this initramfs image can boot it successfully.

How to …

Apply the same change in `/etc` across all clients

Commit change with etckeeper after modifying /srv/root/etc:

# arch-chroot /mnt/root etckeeper commit -m "commit message"

Pull and merge the change in /srv/etc-clientXs:

# for i in {0..5}; do git -C /srv/etc-client$i pull --no-edit; done

Check if there are any merge conflicts:

# for i in {0..5}; do git -C /srv/etc-client$i status; done

Install packages

# arch-chroot /mnt/root pacman -S packages
# for i in {0..5}; do git -C /srv/etc-client$i pull --no-edit; done

Run the same programs and watch the progress on all clients

Install tmux on the server and create ~/.tmux.conf:

unbind C-b
set -g prefix C-a
bind C-a send-prefix
bind-key C-c neww -n clients "ssh client0"\; splitw -d -l 1 "ssh client5"\; splitw -d -l 1 "ssh client4"\; splitw -d -l 1 "ssh client3"\; splitw -d -l 1 "ssh client2"\; splitw -d -l 1 "ssh client1"\; select-layout tiled
bind-key C-r source-file ~/.tmux.conf
bind C-x setw synchronize-panes
setw -g window-status-current-format '#{?pane_synchronized,#[bg=red],}#I:#W'
setw -g window-status-format         '#{?pane_synchronized,#[bg=red],}#I:#W'

Start tmux:

$ tmux

Press Ctrl+A then Ctrl+C to create a window with 6 panes. An SSH client will run in each pane and connect to each client.

Press Ctrl+A then Ctrl+X and the window title on the bottom of the screen will become red, showing the window is currently in synchronization mode. All characters typed on the keyboard will be sent to all 6 panes simultaneously in this mode. Press Ctrl+A then Ctrl+X again to return to normal mode.

For documentation on tmux, see man tmux.

Just another geek.

A Step-by-step Guide to Building a Diskless Linux Cluster

Intro

Why Diskless Cluster?

How it Works?

Let’s Do It!

Set up a DHCP/TFTP/DNS server with dnsmasq

Prepare the Root File System

Prepare the files needed in the client boot process

Boot loader executables

Boot loader configuration file

initramfs

Test if everything works

Give clients their own `/etc` directories

Grant one of the client write permission to `/boot`

How to …

Apply the same change in `/etc` across all clients

Install packages

Run the same programs and watch the progress on all clients

Ruoxi Wang

1 Comment

Add yours

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta

License

Meta

Tags

Ruoxi Wang

Just another geek.

A Step-by-step Guide to Building a Diskless Linux Cluster

Intro

Why Diskless Cluster?

How it Works?

Let’s Do It!

Set up a DHCP/TFTP/DNS server with dnsmasq

Prepare the Root File System

Prepare the files needed in the client boot process

Boot loader executables

Boot loader configuration file

initramfs

Test if everything works

Give clients their own /etc directories

Grant one of the client write permission to /boot

How to …

Apply the same change in /etc across all clients

Install packages

Run the same programs and watch the progress on all clients

Ruoxi Wang

1 Comment

Add yours

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta

License

Meta

Tags

Ruoxi Wang

Give clients their own `/etc` directories

Grant one of the client write permission to `/boot`

Apply the same change in `/etc` across all clients