Debug a Dqlite core dump issue

If you are on Juju 3.2+, and your juju status suggests the agent is lost, you may have a core dump issue. This document shows how to validate this suspicion and then get the backtrace so the issue can eventually be reproduced and addressed.

Check if you do in fact have a core dump issue

Check the database logs:

$ juju debug-log | grep "core dump"

Sometimes logs don’t make it to the database, so also check the controller machine logs:

$ juju ssh -m controller <controller machine>
$ grep “core dump” /var/log/juju/machine-<controller machine ID>.log

If the results show (core dumped), you have a core dump issue.

Example results that point to a core dump issue
/etc/systemd/system/jujud-machine-0-exec-start.sh: line 11: 5862 Segmentation fault (core dumped) '/var/lib/juju/tools/machine-0/jujud' machine --data-dir '/var/lib/juju' --machine-id 0 --debug

or

Assertion failed: type == SQLITE_INTEGER || type == SQLITE_NULL (src/query.c: value_type: 23)
signal: aborted (core dumped)

Retrieve the core dump backtrace

  1. Open juju/Makefile and, in the line with CGO_LINK_FLAGS, remove the -s flag. This will ensure that the jujud-controller binary contains the debug symbols.

  2. Bootstrap a controller with the modified binary. Once the controller is running, SSH into the controller machine (usually machine 0) and install the gdb package:

$ juju ssh -m controller 0
$ sudo apt install gdb
  1. Stop the controller machine:

$ sudo systemctl stop jujud-machine-0.service
  1. Start the controller with gdb and reproduce the crash:

$ LIBDQLITE_TRACE=1 gdb -ex=r --args /var/lib/juju/tools/machine-0/jujud machine --data-dir "/var/lib/juju" --machine-id 0 --debug

This will run the controller. You should keep it running until you reproduce the crash.

If you encounter SIGPIPE errors, which will stop the controller, ignore them by running the following command in the gdb prompt:

gdb> handle SIGPIPE nostop noprint pass
gdb> continue

Once the controller crashes, it should put you back into the gdb prompt. At this point, you can get the backtrace with the following command:

gdb> bt

Grab the output of the backtrace and share it with the Juju team. They will be able to help you diagnose the issue further.