Thursday 6 May 2021

How to properly build a usable Linux kernel with Nvidia driver?

MX-Linux is known as the best Linux Distro for some years because it has a brilliant feature called MX-snapshot. To me, MX-Linux is essentially Ubuntu+OS_Snapshot. However, one major defect of MX-Linux is that its kernel has disabled ACPI so that after installing Nvidia driver, CUDA does not support GPU acceleration for PyTorch/Tensorflow/etc.

To overcome this drawback, you can either install a Ubuntu kernel by try-and-error or compile a kernel that support ACPI.

To compile a kernel step-by-step:

  1. Download a stable or long-term version Linux kernel from https://www.kernel.org/
  2. Make sure that essential building packages are installed
    git fakeroot build-essential ncurses-dev xz-utils libssl-dev lz4 bc

  3. Extract the kernel archive and cd into that folder
  4. Copy current kernel config (in /boot/config-$(uname -r)) into .config
  5. Revise the configuration, there are 3 interfaces:
    make menuconfig (this is in console mode)
    make gconfig (this requires GTK, suitable for gnome-desktop)
    make xconfig (this requires Qt5, suitable for KDE-desktop, can be installed by `apt install qt5-default`)
  6. In the configuration, for every kernel module there are 3 options:
    y: the corresponding binary file is linked with vmlinux
    n: do not build
    m: although it will not link with vmlinux, it will be compiled and you can use modprobe or insmod to manually load the .ko kernel driver on demand
  7. Compile the kernel, `make -j 4`
  8. Install kernel modules striping unneeded symbols,
    make INSTALL_MOD_STRIP=1 modules_install -j 4
  9. To shrink the generated kernel image size, change `MODULES=most` into `MODULES=dep` in /etc/initramfs-tools/initramfs.conf 
  10. Install kernel image into /boot, `make install -j 4`
  11. Now the main folder will be of huge size (>30GB), you need to cleanup the folder
    make clean
    find /lib/modules/<kernel_version>/ -iname "*.ko" -exec strip --strip-unneeded {} \;

After that,  you can now rebuild and install Nvidia driver by either:
- rebooting into the new kernel and run:
    dkms install nvidia/460.67
OR
- without rebooting, directly run
:
    dkms install nvidia/460.67 -k <kernel-version>

  12. If you manually upgraded Nvidia driver, you also need to update initramfs, otherwise, your `/boot/initrd.img` will still contain the old-version driver.

    update-initramfs -c -k $(uname -r)