Debug Linux crash with kdump

Timoté Brusson

2024-11-26

kernel, linux

Introduction

Debugging issues when Linux system panic can be tricky. This operation is even more difficult when the filesystem is hung or switched to read-only. In this kind of cases you have a high probability to have no logs to investigate.

To solve this issue, you can use Linux kernel crash management functions and kexec. Do this manually is a bit complicated so a tool named kdump was created to simplify this for you!

However, configuring crash dump has a few drawbacks :

Increased downtime when the server crashed to perform the dump
Might run out of disk space during the dump process
Requires reserved memory that won’t be usable for the system
Requires reboot to apply the configuration

How does it works

When kernel panic() is triggered, all interrupts will be disabled to ensure that no other processes or threads can modify the system’s state further.
Then the kernel will run callbacks on all elements in panic_notifier_list. This step is necessary for some drivers that need to deinitialize their hardware gracefully. Finally, if crash dump is enabled, it will start crash_kexec().

When this function is triggered, crash dump will use kexec feature to load a new kernel without emptying system memory. Once this done, new loaded kernel will export a memory image (vmcore) and can also dmesg.

Like explained above, crash dump require kexec which was implemented in Linux 2.6.13 (in this commit).

With kdump utility, by default, dump are stored in local filesystem in /var/crash folder but in some cases this can be a good idea to send dump on remote location (When having huge amount of memory for example).

Vmcore can be stored in multiple ways :

Local file system
Raw format on local device/partition dedicated
NFS
FTP
Copy through SSH

Requirements

Kernel side

Before configuring kdump we need to perform a few steps.

First, kernel need to have some options enabled on kernel compilation. Today, many distributions kernel have theses options already enabled. To check that you can run the following command :

1	grep -E "CONFIG_DEBUG_INFO=\|CONFIG_CRASH_DUMP=\|CONFIG_PROC_VMCORE=" "/boot/config-$(uname -r)"

All options must be enabled.

Memory

We’ll need to book memory for kdump kernel at startup. This amount of memory will be unusable by the system. By default kdump have an automatic memory reservation :

CPU architecture	Memory automatically reserved
x86	128 MB
x86_64	161 MB + 64 MB per 1 TB of available RAM
ARM64	512 MB

If you’re using Intel Itanic CPU with IA64 architecture, change your CPU

This default values can lead to out of memory on kdump kernel, particulary on hosts with huge amount of RAM and high number of CPU cores.

To avoid kernel panic on kdump kernel, you can use this formula for required memory calculation :

Recommended memory + ( NC * 12MB )
With NC = number of CPU cores.

For example with a host having 2TB RAM and 128 cores you need to apply the following formula :

Recommended memory = 161MB + ( 2 x 64MB ) = 225M
Added memory = 128 x 12MB = 1536 MB
Total RAM reservation = 1761 MB

In this kind of case reserving 2GB of RAM can be a good idea.

Small script to auto calculate necessary memory :

#!/bin/bash

set -euo pipefail

arch="$(uname -m)"
ram_kb="$(grep MemTotal /proc/meminfo | awk '{print $2}')"
cores="$(nproc)"
ram_tb=$(awk "BEGIN {printf \"%.0f\", $ram_kb / 1024 / 1024 / 1024 / 1024}")

if [[ "$arch" == "x86_64" ]]; then
  base_memory=$((161 + (64 * ram_tb)))
elif [[ "$arch" == "x86" ]]; then
  base_memory=128
elif [[ "$arch" == "arm64" ]]; then
  base_memory=512
else
  echo "Unsupported architecture: $arch"
  exit 1
fi

additional_memory=$((cores * 12))
memory_allocation=$((base_memory + additional_memory))

echo "Total memory to allocate for kdump: $memory_allocation MB"

If you think RAM reservation represent an excessive quantity you can also restrict number of CPU cores usable by kdump.

NB: On some distros you can use directly kdumptool calibrate command to perform memory requirements calculation.

Packages

On many distros, kdump can be easily enabled with packages. For Debian/Ubuntu we can install kdump-tools.

1	sudo apt install -y kdump-tools

Select ‘yes’ on both questions.
If necessary you can reconfigure kdump-tools later with the following command:

1	sudo depkg-reconfigure kdump-tools

Let’s configure Kdump

First we can see kdump status :

~# kdump-config show
no crashkernel= parameter in the kernel cmdline ... failed!
DUMP_MODE:      kdump
USE_KDUMP:      1
KDUMP_COREDIR:      /var/crash
crashkernel addr:
   /var/lib/kdump/vmlinuz
kdump initrd:
   /var/lib/kdump/initrd.img
current state:    Not ready to kdump

kexec command:
  no kexec command recorded

The kdump status shows the current state as not ready. This is normal behavior, as enabling or disabling requires a reboot.

The default configuration file for kdump is located at /etc/default/kump-tools.
By default the following options are enabled and configured :

USE_KDUMP=1
KDUMP_KERNEL=/var/lib/kdump/vmlinuz
KDUMP_INITRD=/var/lib/kdump/initrd.img
KDUMP_COREDIR="/var/crash"

Each time you modify the configuration of kdump-tools, you’ll need to restart the service to apply the changes. The systemd unit of kdump-tools (/lib/systemd/system/kdump-tools.service) is in reality just pointing to systemV unit (/etc/init.d/kdump-tools) which just call /usr/sbin/kdump-config bash script.

1	sudo systemctl restart kdump-tools

To change memory size allocated by kdump config :

/etc/default/grub.d/kdump-tools.cfg

1	GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=<MEMORY TO ALLOCATE>M"

And then :

1	sudo update-grub

Finally :

1	sudo reboot

NB: You can dynamically define the kdump memory reservation depending on your system. The format is crashkernel=<range>:<size>, <range>:<size>,... where <range> defines the amount of physical RAM (e.g. 1G-4G) and <size> specifies the memory reserved (e.g. 192M) such as :

1	crashkernel=1G-4G:192M,2G-64G:256M,4G-64G:256M,64G-256G:512M,256G-1024:1G

Please refers to the official kdump documentation for more informations about crashkernel parameter.

This kind of configuration is useful when you’re doing templating but this doesn’t take into account number of cpu cores.

Let’s break our kernel for testing

Now we can Perform a system crash to test if kdump is working correctly !

1	sudo echo c > /proc/sysrq-trigger

Once the dump is complete, the server will reboot and a new folder will appear in /var/crash :

1 2	~# ls /var/crash/ 202411241738 kdump_lock kexec_cmd

We can go into it to see files :

1
2
3

~# cd /var/crash/202411241738/
/var/crash/202411241738# ls
dmesg.202411241738  dump.202411241738

Here we see dmesg content when system panic and memory dump.

How to analyse crash dump

Analysis requirements

Before analysing the kdump you’ll need to install debug symbols from repos. The debug symbols of kernel MUST match the exact version of the kernel or you won’t be able to analyse the crash dump.

Install debug symbols :

1	sudo apt install linux-image-$(uname -r)-dbg

Warning: Many distributions are cleaning older kernel versions including headers and debug symbols from their mirrors. To avoid issues, it could be a good idea to install debug symbols when you perform kernel version change. You also need to cleanup older debug symbols because it take few space. A good solution is to have dedicated machine for crash analysis with all versions of debug symbols installed on your production.

Then you can start the dump analysis:

1	crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux /var/crash/202411241738/dump.202411241738

depending on the size of the dump, it might take a while to give you access to crash cli.

Basic analysis

First, the crash utils will show you the metadata of the crash :

      KERNEL: /usr/lib/debug/lib/modules/6.1.0-26-cloud-amd64/vmlinux
    DUMPFILE: /var/crash/202411241738/dump.202411241738  [PARTIAL DUMP]
        CPUS: 3
        DATE: Sun Nov 24 17:38:08 UTC 2024
      UPTIME: 00:01:30
LOAD AVERAGE: 0.00, 0.00, 0.00
       TASKS: 4
    NODENAME: scw-nostalgic-tu
     RELEASE: 6.1.0-26-cloud-amd64
     VERSION: #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30)
     MACHINE: x86_64  (2799 Mhz)
      MEMORY: 4 GB
       PANIC: "Kernel panic - not syncing: sysrq triggered crash"
         PID: 776
     COMMAND: "bash"
        TASK: ffff910dc425cbc0  [THREAD_INFO: ffff910dc425cbc0]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

What we can learn from this metadata :

Kernel panic - not syncing: sysrq triggered crash sysrq which allow system administrator to run low level command on kernel triggered kernel panic
The pid 776 which is associate with bash command suggest that sysrq was triggered from terminal session.

Using the command log (or dmesg) will display the log buffer:

1 2	[ 90.192379] sysrq: Trigger a crash [ 90.192584] Kernel panic - not syncing: sysrq triggered crash

Because of this we can confirm that kernel panic was caused by our test:

1	sudo echo c > /proc/sysrq-trigger

Deep analysis

We can take a look at the RAM usage when system crashed, this may indicate OOM crash:

crash> kmem -i
                 PAGES        TOTAL      PERCENTAGE
    TOTAL MEM   937259       3.6 GB         ----
         FREE   867985       3.3 GB   92% of TOTAL MEM
         USED    69274     270.6 MB    7% of TOTAL MEM
       SHARED    12337      48.2 MB    1% of TOTAL MEM
      BUFFERS     2848      11.1 MB    0% of TOTAL MEM
       CACHED    36523     142.7 MB    3% of TOTAL MEM
         SLAB     4932      19.3 MB    0% of TOTAL MEM

   TOTAL HUGE        0            0         ----
    HUGE FREE        0            0    0% of TOTAL HUGE

   TOTAL SWAP        0            0         ----
    SWAP USED        0            0    0% of TOTAL SWAP
    SWAP FREE        0            0    0% of TOTAL SWAP

 COMMIT LIMIT   468629       1.8 GB         ----
    COMMITTED        0            0    0% of TOTAL LIMIT

Then we can display complete backtrace to see list of kernel functions executed before crash :

crash> bt
PID: 776      TASK: ffff910dc425cbc0  CPU: 0    COMMAND: "bash"
 #0 [ffffa75b00b739e0] machine_kexec at ffffffffa606a92f
 #1 [ffffa75b00b73a38] __crash_kexec at ffffffffa6161047
 #2 [ffffa75b00b73af8] panic at ffffffffa69b50a2
 #3 [ffffa75b00b73b78] sysrq_handle_crash at ffffffffa669a116
 #4 [ffffa75b00b73b80] __handle_sysrq.cold at ffffffffa69dc549
 #5 [ffffa75b00b73bb0] write_sysrq_trigger at ffffffffa669aa64
 #6 [ffffa75b00b73bc0] proc_reg_write at ffffffffa63e18a6
 #7 [ffffa75b00b73bd8] vfs_write at ffffffffa6345607
 #8 [ffffa75b00b73c70] ksys_write at ffffffffa6345acb
 #9 [ffffa75b00b73ca8] do_syscall_64 at ffffffffa69ed155
#10 [ffffa75b00b73cc0] do_fcntl at ffffffffa635d407
#11 [ffffa75b00b73d80] _raw_spin_unlock at ffffffffa6a01dba
#12 [ffffa75b00b73d88] wp_page_reuse at ffffffffa62b3670
#13 [ffffa75b00b73da0] do_wp_page at ffffffffa62b63e3
#14 [ffffa75b00b73dd8] __handle_mm_fault at ffffffffa62bbbeb
#15 [ffffa75b00b73ea8] handle_mm_fault at ffffffffa62bc2eb
#16 [ffffa75b00b73ee0] do_user_addr_fault at ffffffffa6079ae0
#17 [ffffa75b00b73f28] exit_to_user_mode_prepare at ffffffffa6134330
#18 [ffffa75b00b73f50] entry_SYSCALL_64_after_hwframe at ffffffffa6c00126
    RIP: 00007fdef606c240  RSP: 00007ffc149ebb58  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 0000000000000002  RCX: 00007fdef606c240
    RDX: 0000000000000002  RSI: 00005613528ddc90  RDI: 0000000000000001
    RBP: 00005613528ddc90   R8: 0000000000000007   R9: 0000000000000073
    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000002
    R13: 00007fdef6147760  R14: 0000000000000002  R15: 00007fdef61429e0
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

The stack trace is printed in reverse chronological order, which means the most recent call is listed first.
Here we can have informations to see which function in kernel was triggered for example will check what happen on line #3 :

1	crash> sym ffffffffa669a116

That return something like that :

1	ffffffffa669a116 (t) sysrq_handle_crash+22 debian/build/build_amd64_none_cloud-amd64/drivers/tty/sysrq.c: 155

We can directly go to see code to have more informations : sysrq panic trigger

If you want to go deeper you can get ASM instruction sent to CPU :

1	crash> dis -l sysrq_handle_crash+22

In our case this return this :

1	0xffffffffa669a116 <sysrq_handle_crash+22>: nopw %cs:0x0(%rax,%rax,1)

In this case this doesn’t give us a lot of informations.

We can also check informations on pid generating panic :

1
2
3

crash> ps -p 776
PID: 0        TASK: ffffffffa7a1aa40  CPU: 0    COMMAND: "swapper/0"
          PID: 776      TASK: ffff910dc425cbc0  CPU: 0    COMMAND: "bash"

Finally you can get opened files :

crash> files
PID: 776      TASK: ffff910dc425cbc0  CPU: 0    COMMAND: "bash"
ROOT: /    CWD: /root
 FD       FILE            DENTRY           INODE       TYPE PATH
  0 ffff910dc2ff4400 ffff910dc3752a80 ffff910dc3728c80 CHR  /dev/pts/0
  1 ffff910dc53ecb00 ffff910dc37d3f00 ffff910dc3c78310 REG  /proc/sysrq-trigger
  2 ffff910dc2ff4400 ffff910dc3752a80 ffff910dc3728c80 CHR  /dev/pts/0
 10 ffff910dc2ff4400 ffff910dc3752a80 ffff910dc3728c80 CHR  /dev/pts/0
255 ffff910dc2ff4400 ffff910dc3752a80 ffff910dc3728c80 CHR  /dev/pts/0

Bonus information

Sometimes you will find into dmesg logs the following kind of line : Oops: 0002

Oops is Kernel Page Error code, the number displayed should be translated in binary to be able to understand what append.

Here 0002 is 0010 in binary.

To have an idea of what this code mean, we need to go check fault.c file in kernel in Memory Management part.

To avoid painful code reading here is a small table with all codes :

	Bit
Value	0	1	2	3
0	Not present page (not found)	Read	Kernel mode	Not instruction fetch
1	Protection violation (invalid access)	Write	User mode	Instruction fetch

You need to read error code from right to left to bit number 0 is the bit at the right of the code.
So here the error code mean : Page not found on write operation performed on kernel mode and issue wasn’t triggered by Instruction fetch.

🙏 Acknowledgements

Special thanks to @sanecz for proofreading and correcting this blogpost.