Skip to content

Troubleshooting

make doctor fails

/dev/kvm not found

Cause: KVM is not enabled in BIOS, or the kvm kernel modules are not loaded.

Fix:

# Check if KVM modules are loaded
lsmod | grep kvm

# Load them manually
sudo modprobe kvm_intel   # Intel CPUs
sudo modprobe kvm_amd     # AMD CPUs

# If modules don't exist, enable virtualization in BIOS/UEFI settings.

virsh not found

Cause: libvirt is not installed.

Fix:

sudo apt install -y qemu-kvm libvirt-daemon-system libvirt-clients virtinst bridge-utils
sudo systemctl enable --now libvirtd
sudo usermod -aG libvirt $USER
# Log out and back in for group membership to take effect

libvirtd not running

Fix:

sudo systemctl start libvirtd
sudo systemctl enable libvirtd

Terraform not found

Fix: Install via tfenv or direct download:

# Using tfenv
git clone https://github.com/tfutils/tfenv.git ~/.tfenv
echo 'export PATH="$HOME/.tfenv/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
tfenv install 1.14.4
tfenv use 1.14.4


make up fails

"Could not open '/var/lib/libvirt/images/bosh-lab'"

Cause: The storage pool directory doesn't exist or libvirt doesn't have permissions.

Fix:

sudo mkdir -p /var/lib/libvirt/images/bosh-lab
sudo chown libvirt-qemu:kvm /var/lib/libvirt/images/bosh-lab

"Error creating libvirt network: already exists"

Cause: A previous bosh-lab network exists from a failed run.

Fix:

virsh net-destroy bosh-lab
virsh net-undefine bosh-lab
make up

Cloud image download fails

Cause: Network issue or Ubuntu mirror down.

Fix:

# Download manually
wget -O state/cache/jammy-server-cloudimg-amd64.img \
  https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img
make up


make bootstrap fails

"Connection refused" to mgmt VM

Cause: VM hasn't finished booting, or cloud-init hasn't installed SSH.

Fix:

# Check VM is running
virsh list --all

# Check VM console for boot progress
virsh console bosh-lab-mgmt

# Wait longer — cloud-init can take 2-5 minutes on first boot
# The bootstrap script retries 30 times with 5-second delays

"Permission denied (publickey)"

Cause: SSH key wasn't injected into cloud-init properly.

Fix:

# Verify the key was generated
ls -la state/creds/mgmt_ssh*

# Verify cloud-init has the key
grep ssh_authorized_keys state/mgmt-cloudinit.yaml

# If missing, recreate:
make down
rm -f state/creds/mgmt_ssh*
make up

bosh create-env fails

Cause: Many possible reasons. Check the log.

Fix:

# Read the full log
cat state/logs/create-director.log

# Common issues:
# 1. Nested KVM not enabled — check /dev/kvm inside the VM
# 2. Network conflict — another 10.245.0.0/24 network exists
# 3. Disk space — need ~40GB free inside the VM
# 4. Memory — director needs ~4GB RAM

9p mount fails inside VM

Cause: The 9p kernel module may not be loaded, or the filesystem tag doesn't match.

Fix:

# SSH into the VM and check
ssh -i state/creds/mgmt_ssh bosh@10.245.0.2

# Inside VM:
sudo mount -t 9p -o trans=virtio,version=9p2000.L state /mnt/state
ls /mnt/state

# If 9p isn't available, the bootstrap script will fall back to SCP
# for copying state files back and forth.


make concourse fails

"Stemcell not found"

Cause: The stemcell version doesn't match what's available on bosh.io.

Fix:

# Check available stemcells
bosh -e lab stemcells

# Upload manually if needed
bosh -e lab upload-stemcell \
  "https://bosh.io/d/stemcells/bosh-warden-boshlite-ubuntu-jammy-go_agent?v=1.1044"

Concourse deploy times out

Cause: Compilation takes a long time on limited resources. Default timeout may be too short.

Fix:

# Check deployment status
bosh -e lab -d concourse instances
bosh -e lab tasks --recent

# Compilation can take 20-40 minutes on first deploy.
# Subsequent deploys reuse compiled packages.


make test fails

Test 1 fails (BOSH Director unreachable)

Check make status and make bootstrap. The director may need to be re-bootstrapped after a VM restart.

Test 2 fails (CredHub smoke test)

CredHub credentials may have rotated. Re-extract from vars-store:

bosh int state/vars-store.yml --path /credhub_admin_client_secret

Test 3 fails (Concourse not deployed)

Run make concourse to deploy Concourse.


General Tips

Reboot recovery

After host reboot:

# VMs should auto-start (libvirt autostart is enabled on the network)
# But the VM may need manual start:
virsh start bosh-lab-mgmt

# Then re-bootstrap (idempotent, won't recreate director):
make bootstrap

Freeing disk space

# Check what's using space
du -sh state/*

# Clear the release/stemcell cache (will re-download next deploy)
rm -rf state/cache/*.tgz

# Clear logs
rm -f state/logs/*.log

Checking resource usage

# VM resources
virsh dominfo bosh-lab-mgmt

# Host resources
free -h
df -h
nproc