Introduction
Debugging issues when Linux system panic can be tricky. This operation is even more difficult when the filesystem is hung or switched to read-only. In this kind of cases you have a high probability to have no logs to investigate.
To solve this issue, you can use Linux kernel crash management functions and kexec. Do this manually is a bit complicated so a tool named kdump was created to simplify this for you!
However, configuring crash dump has a few drawbacks :
- Increased downtime when the server crashed to perform the dump
- Might run out of disk space during the dump process
- Requires reserved memory that won’t be usable for the system
- Requires reboot to apply the configuration
How does it works
When kernel panic() is triggered, all interrupts will be disabled to ensure that no other processes or threads can modify the system’s state further.
Then the kernel will run callbacks on all elements in panic_notifier_list. This step is necessary for some drivers that need to deinitialize their hardware gracefully. Finally, if crash dump is enabled, it will start crash_kexec().
When this function is triggered, crash dump will use kexec feature to load a new kernel without emptying system memory. Once this done, new loaded kernel will export a memory image (vmcore) and can also dmesg.
Like explained above, crash dump require kexec which was implemented in Linux 2.6.13 (in this commit).
With kdump utility, by default, dump are stored in local filesystem in /var/crash
folder but in some cases this can be a good idea to send dump on remote location (When having huge amount of memory for example).
Vmcore can be stored in multiple ways :
- Local file system
- Raw format on local device/partition dedicated
- NFS
- FTP
- Copy through SSH
Requirements
Kernel side
Before configuring kdump we need to perform a few steps.
First, kernel need to have some options enabled on kernel compilation. Today, many distributions kernel have theses options already enabled. To check that you can run the following command :
1 | grep -E "CONFIG_DEBUG_INFO=|CONFIG_CRASH_DUMP=|CONFIG_PROC_VMCORE=" "/boot/config-$(uname -r)" |
All options must be enabled.
Memory
We’ll need to book memory for kdump kernel at startup. This amount of memory will be unusable by the system. By default kdump have an automatic memory reservation :
CPU architecture | Memory automatically reserved |
---|---|
x86 | 128 MB |
x86_64 | 161 MB + 64 MB per 1 TB of available RAM |
ARM64 | 512 MB |
If you’re using Intel Itanic CPU with IA64 architecture, change your CPU
This default values can lead to out of memory on kdump kernel, particulary on hosts with huge amount of RAM and high number of CPU cores.
To avoid kernel panic on kdump kernel, you can use this formula for required memory calculation :
Recommended memory + ( NC * 12MB )
With NC = number of CPU cores.
For example with a host having 2TB RAM and 128 cores you need to apply the following formula :
Recommended memory = 161MB + ( 2 x 64MB ) = 225M
Added memory = 128 x 12MB = 1536 MB
Total RAM reservation = 1761 MB
In this kind of case reserving 2GB of RAM can be a good idea.
Small script to auto calculate necessary memory :
1 |
|
If you think RAM reservation represent an excessive quantity you can also restrict number of CPU cores usable by kdump.
NB: On some distros you can use directly kdumptool calibrate
command to perform memory requirements calculation.
Packages
On many distros, kdump can be easily enabled with packages. For Debian/Ubuntu we can install kdump-tools
.
1 | sudo apt install -y kdump-tools |
Select ‘yes’ on both questions.
If necessary you can reconfigure kdump-tools later with the following command:
1 | sudo depkg-reconfigure kdump-tools |
Let’s configure Kdump
First we can see kdump status :
1 | ~# kdump-config show |
The kdump status shows the current state as not ready. This is normal behavior, as enabling or disabling requires a reboot.
The default configuration file for kdump is located at /etc/default/kump-tools
.
By default the following options are enabled and configured :
1 | USE_KDUMP=1 |
Each time you modify the configuration of kdump-tools, you’ll need to restart the service to apply the changes. The systemd unit of kdump-tools (/lib/systemd/system/kdump-tools.service) is in reality just pointing to systemV unit (/etc/init.d/kdump-tools) which just call /usr/sbin/kdump-config
bash script.
1 | sudo systemctl restart kdump-tools |
To change memory size allocated by kdump config :
1 | GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=<MEMORY TO ALLOCATE>M" |
And then :
1 | sudo update-grub |
Finally :
1 | sudo reboot |
NB: You can dynamically define the kdump memory reservation depending on your system. The format is crashkernel=<range>:<size>
, <range>:<size>,...
where <range>
defines the amount of physical RAM (e.g. 1G-4G) and <size>
specifies the memory reserved (e.g. 192M) such as :
1 | crashkernel=1G-4G:192M,2G-64G:256M,4G-64G:256M,64G-256G:512M,256G-1024:1G |
Please refers to the official kdump documentation for more informations about crashkernel parameter.
This kind of configuration is useful when you’re doing templating but this doesn’t take into account number of cpu cores.
Let’s break our kernel for testing
Now we can Perform a system crash to test if kdump is working correctly !
1 | sudo echo c > /proc/sysrq-trigger |
Once the dump is complete, the server will reboot and a new folder will appear in /var/crash
:
1 | ~# ls /var/crash/ |
We can go into it to see files :
1 | ~# cd /var/crash/202411241738/ |
Here we see dmesg content when system panic and memory dump.
How to analyse crash dump
Analysis requirements
Before analysing the kdump you’ll need to install debug symbols from repos. The debug symbols of kernel MUST match the exact version of the kernel or you won’t be able to analyse the crash dump.
Install debug symbols :
1 | sudo apt install linux-image-$(uname -r)-dbg |
Then you can start the dump analysis:
1 | crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux /var/crash/202411241738/dump.202411241738 |
depending on the size of the dump, it might take a while to give you access to crash cli.
Basic analysis
First, the crash utils will show you the metadata of the crash :
1 | KERNEL: /usr/lib/debug/lib/modules/6.1.0-26-cloud-amd64/vmlinux |
What we can learn from this metadata :
Kernel panic - not syncing: sysrq triggered crash
sysrq which allow system administrator to run low level command on kernel triggered kernel panic- The pid
776
which is associate withbash
command suggest that sysrq was triggered from terminal session.
Using the command log (or dmesg) will display the log buffer:
1 | [ 90.192379] sysrq: Trigger a crash |
Because of this we can confirm that kernel panic was caused by our test:
1 | sudo echo c > /proc/sysrq-trigger |
Deep analysis
We can take a look at the RAM usage when system crashed, this may indicate OOM crash:
1 | crash> kmem -i |
Then we can display complete backtrace to see list of kernel functions executed before crash :
1 | crash> bt |
The stack trace is printed in reverse chronological order, which means the most recent call is listed first.
Here we can have informations to see which function in kernel was triggered for example will check what happen on line #3 :
1 | crash> sym ffffffffa669a116 |
That return something like that :
1 | ffffffffa669a116 (t) sysrq_handle_crash+22 debian/build/build_amd64_none_cloud-amd64/drivers/tty/sysrq.c: 155 |
We can directly go to see code to have more informations : sysrq panic trigger
If you want to go deeper you can get ASM instruction sent to CPU :
1 | crash> dis -l sysrq_handle_crash+22 |
In our case this return this :
1 | 0xffffffffa669a116 <sysrq_handle_crash+22>: nopw %cs:0x0(%rax,%rax,1) |
In this case this doesn’t give us a lot of informations.
We can also check informations on pid generating panic :
1 | crash> ps -p 776 |
Finally you can get opened files :
1 | crash> files |
Bonus information
Sometimes you will find into dmesg logs the following kind of line : Oops: 0002
Oops is Kernel Page Error code, the number displayed should be translated in binary to be able to understand what append.
Here 0002 is 0010 in binary.
To have an idea of what this code mean, we need to go check fault.c file in kernel in Memory Management part.
To avoid painful code reading here is a small table with all codes :
Bit | ||||
---|---|---|---|---|
Value | 0 | 1 | 2 | 3 |
0 | Not present page (not found) | Read | Kernel mode | Not instruction fetch |
1 | Protection violation (invalid access) | Write | User mode | Instruction fetch |
You need to read error code from right to left to bit number 0 is the bit at the right of the code.
So here the error code mean : Page not found on write operation performed on kernel mode and issue wasn’t triggered by Instruction fetch.
🙏 Acknowledgments
Special thanks to @sanecz for proofreading and correcting this blogpost.