Post-mortem – failed update

The evening before Yesterday i had planned an on-site intervention, i logged in the server, did a couple sysadmin things, updated the server, crossed fingers and enter the “reboot” command.

Of course, the server did not come back online…

So the next morning, once onsite i plug a screen on server and instead of a grub prompt or a kernel panic, i am showed the usual login prompt. I start suspecting a network issue.

Since there is no way i am going to blind type the 24 char length root password, i reboot, add init=/bin/bash to grub command line, remount / rw and change password to a temp value.

a crt monitor on a ttable, showing a bunch of fsck related messages. beneath a keyboard. a read screwdriver is holding the Y key down.

(completely unrelated image)

ip a show the interface, but without ip address, and showing link as down. So i start by changing network cable, manually up interface, to no avail.

Thinking about a module issue, i reboot to previous kernel version. No network.

Some internet searches get me issues about that network card that could be fixed by another dkms module. So i goes this way. Need to install some packages, on a disconnected machine…

I tried first to download packages on my laptop, and transfer them on server via usb drive. The amount of needed dependencies promptly encourage me to find a better way.

So i boot the server on a linux install usb drive, get a shell (and a network connection), assemble soft raid array, mount it, chroot inside and install needed packages (namely https://github.com/awesometic/realtek-r8125-dkms/releases). modules is built along the way, initramfs is automatically regenerated. I blacklist previous module just to be on the safe side.

I then reboot the server, which get it’s connection back…

Lesson learned : Machines will always fail to boot. Services will fail to starts.
The clever part was rebooting the server prior to scheduled on-site intervention. i regularly reboot servers, for upgrade purpose, and to be sure servers and services starts properly.


Posted

in

by

Tags: