Thursday 4 April 2024

How to clone an fully installed Linux system to another computer, or do live offline system upgrade?

A. Full-system Clone

When managing Linux server cluster, very often we need to clone a fully-installed Linux system to other computers/server so that we do not need to re-install all required packages and libraries, and do not need to reconfigure some packages such as inputrc, vimrc, bashrc, etc. Here, I will describe two common methods:

1. The recommended way is to use MX-Linux's mx-snapshot

This is a great utility. It can create a Linux-rescue ISO image that is a large bootable live image containing all packages/libraries/configurations, optionally containing home folder contents. At the same time, you can use the Live system to install onto as many harddisks as possible.

Remeber to select "Preserve /home (ext4)" option if you want to keep existing user folders.

2. Manually copy over all folders and setup grub. When copying over all file, you need to preserve file permissions, thus use either "cp -rfPp" or "rsync -avlP", or "tar --numeric-owner -czf"

mount /dev/sda3 /mnt (root partition)
mount /dev/sda2 /mnt/boot (required if separate boot partition)
mount /dev/sda1 /mnt/mnt/efi
mount --bind /dev /mnt/dev
mount --bind /dev/pts /mnt/dev/pts
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys

grub-mkdevicemap
grub-install --efi-directory=/mnt/efi /dev/sda
update-grub
update-initramfs -u -k all 

However, the SUID/SGID/sticky bit will be reset when copying over directories or extracting archives (even if the preserve-permission option is set), so you need to manually redo setting them afterwards. The following list of commands need to have SUID bit set: passwd, su, sudo, ping*, chsh, mount, umount, fusermount, etc.


B. Live Offline System Upgrade

For offline live upgrade (while all other users are still using the system), Method A1 will introduce a very long down-time (typically a few hours, depending on the size of your fully-installed system), as the installation requires booting into the live system while other users cannot access. So we typically use Method A2, for which the only downtime is the server reboot. The steps are as follows:

1. In a running Linux system, create two new folders under /, e.g., /full-backup /full-upgrade.

2. Copy over the entire new-OS root system folders (i.e., /etc, /bin, /sbin, /usr, /opt, /root, /var, /lib, /lib64, /boot, etc., except /home, /dev, /sys, /mnt, /proc, /run, etc.) from USB storage to /full-upgrade using Method A2 ("cp -rfPp" or "rsync -avlP", or "tar --numeric-owner -czf"). This typically takes a few hours.

3. Copy over statically-linked busybox to the root folder (the version of busybox must not require ld-linux-x86-64.so interpreter). Typically, this file should already be prepared somewhere inside /full-upgrade.

root@my-laptop:~# file /usr/bin/busybox
/usr/bin/busybox: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=36c64fc4707a00db11657009501f026401385933, for GNU/Linux 3.2.0, stripped

4. Copy over credential/configuration files from / to /full-upgrade, e.g.,

    /etc/passwd, /etc/groups, /etc/shadow, (do NOT directly copy over, only copy over real user entries because you need to keep the user IDs for service users such as gdm/sshd/_apt/openvpn/lp/etc., otherwise, these services will not function properly)
    /etc/network/interfaces, /etc/NetworkManager/*, /etc/openvpn,
    /etc/fstab, /etc/exports, /root/.ssh, etc.

5. Move all old-version root system folders from / to /full-backup and move all new-version root system folders from /full-upgrade to / , i.e.,

cd /full-upgrade; for f in *; do /busybox mv -v /$f /full-backup/; /busybox mv -v $f /;done

From this point onwards, all newly launched programs will use new-version libraries and packages, while existing running programs will continue to use old libraries and packages. Since existing processes might open configs/files/folders or spawn new processes, all of which will be of new-version, there might be some conflicts/errors/failures because the running services and kernel are still of old-version before system reboot. However, the time period will be short because you only need to do the following.

6. Setup the boot-loader for the new-version root system using steps in Method A2.

  • unmount the old EFI partition, (if previously your EFI is mounted at `/boot/efi`, the "/busybox mv" command will move its mount-point to `/full-backup/boot/efi`, so `umount /full-backup/boot/efi`)
  • mount the EFI partition to /boot/efi (`mount /dev/sda1 /boot/efi`)
  • delete unused EFI boot images in /boot/efi/EFI (some bios will remember the previously booted EFI image, since OS has changed, the old EFI boot image might not work, so typically just run `rm -rf /boot/efi/EFI/*`, or move them to some backup location.)
  • Install the new grub EFI boot image:
grub-mkdevicemap
grub-install --efi-directory=/boot/efi --root-directory=/ /dev/sda
update-grub
update-initramfs -u -k all 

7. Delete /busybox (for security purposes)

8. Reboot the server into the new-version system.

Typically, to minimize the system down time, Steps 5-8 should be done in one go, preferably at the end of the day when most people has left office. If ML/CUDA training happens during the night, Steps 5-8 should be done before lunch, so that after lunch, users can restart all their programs.

Tuesday 2 April 2024

How to bring a stopped process into a tmux session with console display?

Very often, we run some long-waiting command with tons of console output out of tmux, then realize that we should move it into a tmux session so that we can remotely log in and monitor its progress. But then the process can hardly be terminated and re-run, and Ctrl+Z and `fg` can only resume it in the same console.

To do so:

1. Press Ctrl+Z to stop the process

2. launch a new tmux session or attach an existing tmux session

3. run `reptyr <PID>` inside tmux session with process ID of the stopped process

Saturday 24 February 2024

How to tmux an X11 GUI application so that it can persist through session detach and client disconnection?

The solution is to use xpra in addition to tmux.

Firstly, add the following 4 lines to your $HOME/.profile

alias xp_start='xpra start :100  --start-child=xterm --start-via-proxy=no --opengl=yes'
alias xp_list='xpra list'
alias xp_stop='xpra stop :100'
alias xp_attach='xpra attach :100'

 

To run an X11 app that persist through sessions:

0. SSH (with X11 tunneling, i.e., -X or -Y) into your server containing <your-x11-app>

1. create a new virtual xpra session, run xp_start

2. enter any tmux session or create a new tmux session, run `tmux a` or `tmux`

3. run the X11 app in tmux session with DISPLAY set to 100, run `DISPLAY=:100 <your-x11-app>`

4. inside tmux, attach the xpra session, run xp_attach. This will display the X11 app on your current screen. You can detach the xpra session by Ctrl+C. Detaching the tmux session or SSH disconnection will auto detach the xpra session as well.


Working Principle:

Xpra works by creating a virtual display (with number 100 in this example), then running <your-x11-app> on this virtual display. Since this is a virtual display, all apps running inside it will not be killed due to disconnection or session detach (unless you manually stop the display by xp_stop). When you attach this display :100, all x11-apps running inside this display will be shown on your screen, and they will persist through sessions.