Debug bootstrap machine failures¶
This guide will show you how to diagnose and fix issues with bootstrapping and starting new machines.
Juju’s bootstrapping process can be broken down into several steps:
Provision resources/a machine
M
from the relevant cloudInstall the Juju agent
jujud
on machineM
Poll the newly created instance for an IP address, and attempt to connect to
M
Run the machine configuration script for
M
, which e.g. installs relevant packages
The output of juju bootstrap
will tell you which step you’re at. If your failure is at step 1, then the issue is most
likely with your cloud provider or configuration - see the guides here.
Otherwise, we need to connect to the machine and look at the logs to find out what’s gone wrong.
Connect to the machine¶
Via ssh¶
The easiest way to connect to the machine is via ssh. We can do this if Juju has been successfully able to connect to your controller. In this case, you will see the line
Connected to [ip-address]
in your juju bootstrap
output.
A common type of failure here is when the terminal hangs on the line
Running machine configuration script...
The machine configuration should take less than 10 minutes - any longer than this is a sign that something has gone wrong.
Luckily, the machine is already reachable at this step, so we can directly ssh
into it to find out what’s happening.
Copy the IP address that Juju connected to above, and run
ssh ubuntu@[ip-address] -i [juju-data-dir]/ssh/juju_id_rsa
Here, [juju-data-dir]
defaults to ~/.local/share/juju
, but if you’ve set the JUJU_DATA
environment variable, it
will be equal to that instead.
See here for a more in-depth guide on using SSH to connect to a machine.
Via the cloud provider¶
If Juju wasn’t able to connect to your machine’s IP address, then ssh
probably won’t be able to either. With this type
of failure, you’ll often see your terminal hang after the step
Attempting to connect to [ip-address]:[port]
In this case, we will need to go through the cloud provider to connect to the machine. The process here depends on what cloud you’re using.
LXC / LXD¶
In the juju bootstrap
output, you should see a line like
Launching controller instance(s) on localhost/localhost...
which will be followed by the LXD container name (in the form juju-XXXXXX-0
). We can use the lxc
command line tool
to get a shell inside the machine. Copy the container name, then run
lxc exec [container-name] bash
Now, we should have a shell inside the machine, and can use the steps below to search the logs.
Kubernetes¶
In the juju bootstrap
output, you should see a line like
Creating k8s resources for controller [namespace]
where [namespace]
is something like controller-foobar
. Inside this namespace, Juju will have created a pod called
controller-0
- we want to access the api-server
container in this pod. To do this, we use kubectl
:
kubectl exec controller-0 -itc api-server -n [namespace] -- bash
(If using MicroK8s, call this command via microk8s kubectl
).
Examine the logs¶
Once we have a shell inside the machine, we can
ls /var/log
which will show you all the available logs. Which log to look at depends on the type of failure, but generally speaking,
syslog
, cloud-init.log
and cloud-init-output.log
are good ones to look at.
Some good tools for examining logs are
less [log-file]
which will let you scroll through the log file, and
tail -f [log-file]
which will track updates to the log file.
Errors (especially fatal ones) will often be near the end of a log file. You may also have luck searching your logs for phrases such as “error” or “fail”.