Category Archives: Linux

Disable Mouse in Vim System-wide

For reasons unknown the Vim project decided to switch mouse support *on* by default, and the Debian package maintainer decided to just flow that downstream in Deb 10. It was a… contentious change. I have no idea what other distros are doing. You’ll know you have this if you go to select text in your terminal window and suddenly Vim is in “visual” mode.

Supposedly the Vim project can’t change the default (again) because “people are used to it” (uhhhh).

The issue is that while there is an include from /etc/vim/vimrc to include /etc/vim/vimrc.local but either creating this file apparently either disables every other default option completely (turning off, for example, syntax highlighting) or the defaults will load after it (!?) if the user doesn’t have their own vimrc and override your changes.

Mostly I’m happy with the defaults, I usually just want to change this one thing without having to set up a ~/.vimrc in EVERY HOME DIRECTORY

Anyway, here, as root:

cat > /etc/vim/vimrc.local <<EOF
" Include the defaults.vim so that the existence of this file doesn't stop all the defaults loading completely
source \$VIMRUNTIME/defaults.vim
" Prevent the defaults from loading again later and overriding our changes below
let skip_defaults_vim = 1
" Here's where to unset/alter stuff you do/don't want
set mouse=
EOF

Creating a VRF and Running Services Inside it on Linux

Edited 2023-05-10 to add: 
Please check the comments at the bottom for a Debian 'interfaces' example and a netplan example provided by generous visitors. 
Netplan surely seems the easiest way to stand up VRFs, which does not particularly come as a surprise to me.
You will still need to configure your services to run inside the VRFs though. I thought it possible that there might be a more elegant way to do this in systemd now, but the existence of this issue against systemd suggests not.

This was remarkably difficult to find a simple explanation for on one page and whilst not all that complex to achieve if you understand all of the component parts sometimes it is useful to have a complete explanation and so, hopefully, someone will find this howto useful.

There are a number of reasons to have one or more VRFs (VRF stands for Virtual Routing and Forwarding) added to a system – researching and discussing the *why* of doing this is not in scope for this article – I’m going to assume you know why you’d want to do this.

If you somehow don’t really know what a VRF is beyond suspecting it’s what you want, in essence each VRF has it’s own routing table and this allows you to partition – in networking terms – a system to service two or more entirely different networks with their own routing tables (eg: each can have it’s own default route, and their own routes to what would otherwise be overlapping IP ranges).

NB: It’s important to note that the work you’re doing here can break your existing management access, if you’re already relying on the interface you want to move into the VRF to access the server in the first place. Ensure you can access the server over an interface OTHER than the one you want to move into the VRF – be it over a different NIC or using the local console / IPMI / ILO / DRAC etc.

Example environment

Let’s say you have a Linux box with two interfaces, eth0 and eth1 (even if systemd’s “predictable” naming is more common now).

eth0 carries your production traffic. This has a default gateway to reach the Internet, or whatever production network you have, and it’s configuration is ultimately irrelevant.

eth1 faces your management network. For demonstration purposes, our IP is 10.0.0.2/24, the default gateway we want to use for management traffic will be 10.0.0.1, and this is the interface you want to be in a separate VRF to completely segment out your management traffic.

All of the below instruction takes place as root – prepend commands with sudo if you prefer to sudo.

How do I create a VRF?

In Linux VRFs have a name, and an associated routing table number. Let’s say we want to create a VRF called Mgmt-VRF using table number 2 (the name and number is up to you – I’ve just chosen 2 – the number should just not be in use and if you don’t currently have any VRFs then 2 will be fine), and set it “up” to actually enable it.

ip link add Mgmt-VRF type vrf table 2
ip link set dev Mgmt-VRF up

Verify your VRF exists

ip vrf show

Which should show you:

Name              Table
-----------------------
Mgmt-VRF             2

Add your interface(s) to the new VRF (This will break your connection if you’re currently using them! Exercise caution!), here we add eth1 to Mgmt-VRF:

ip link set dev eth1 master Mgmt-VRF

You can now add routes to your new VRF like this, here we’re adding the default gateway of 10.0.0.1 to the routing table for our new VRF:

ip route add table 2 0.0.0.0/0 via 10.0.0.1

You can then validate that the default route exists in that table:

ip route show table 2

You should see something like:

default via 10.0.0.1 dev eth1
broadcast 10.0.0.0 dev eth1 proto kernel scope link src 10.0.0.2
10.0.0.0/24 dev eth1 proto kernel scope link src 10.0.0.2
local 10.0.0.2 dev eth1 proto kernel scope host src 10.0.0.2

At this point you could add any more static routes your new VRF might require, and you’re essentially done with configuring the VRF. The interface eth1 now exists in our new VRF.

Okay, how do I *use* the VRF?

Any tinkering will quickly reveal that your services which were bound to (or accessible over) the IP on eth1 don’t work anymore, at least if they only bind by IP and not by device.

You’ll also notice that when you use ping or traceroute or whatever it’ll run with the default routing table – even if you set the source IP to 10.0.0.2, it won’t work. This is because, like sshd, ping (and bash, and anything else) will run in the context of the default VRF unless you specifically request otherwise. Those processes will use the default routing table and will only have access to listen to IPs that are on interfaces also in that same VRF.

If the processes or services are be configured to bind to an interface however, they will operate in the VRF that the interface is configured for. A good example of a command with native support for binding to interfaces rather than IPs is traceroute:

traceroute -i eth1 8.8.8.8

But if you just want a generic way to execute commands inside a particular VRF, doing so is fairly easy using ip vrf exec, here, the same traceroute command without the need to specify an interface:

ip vrf exec Mgmt-VRF traceroute 8.8.8.8

If you’re going to be doing a lot of work in a particular VRF, you will probably find it most convenient to start your preferred shell (eg bash) using ip vrf exec as all child processes you start from that shell will also operate from that VRF, then exit the shell once you want to return to the default routing table:

ip vrf exec Mgmt-VRF /bin/bash
# do your work now, eg
traceroute 8.8.8.8
# time to go back to the default routing table
exit

Great, I can run traceroute. But what about my SERVICES?

For linux distributions running systemd – shifting services to run inside a VRF is actually relatively straightforward.

systemd calls processes and services under it’s purview “units”, and has so called unit files that describe services, how and when (using dependencies and targets) they should be started, etc

If you want to run a single instance of a service across all VRFs for some reason this is possible though beyond the scope of this article (look up net.ipv4.tcp_l3mdev_accept and net.ipv4.udp_l3mdev_accept).

Alternatively you might choose to have several copies of the service running, each in different VRFs (make sure they use different socks/pipes/pid files etc!), which is also beyond the scope of this article. It’s up to you to decide what suits your environment best.

However – if you only want to change your one existing copy of your service to run in a VRF, you just have to specify the new command that systemd executes in a so called override file.

You should use override files rather than modifying the main unit file because – in general – there will not be an override file in the distribution-provided package for your service, so when you do package upgrades you shouldn’t have any collisions with the package version of the file and your modified one which means that your modifications will be preserved. That said, you will have to keep an eye on whether you need to update your override ExecStart command if it changes in a breaking way between releases (check this first if a service you have overridden starts misbehaving after package updates!).

First you need to look in the unit file to get the current command that is executed to start the service:

systemctl cat sshd

You should see something like this (taken from a Debian 10 x64 system):

# /lib/systemd/system/ssh.service
[Unit]
Description=OpenBSD Secure Shell server
Documentation=man:sshd(8) man:sshd_config(5)
After=network.target auditd.service
ConditionPathExists=!/etc/ssh/sshd_not_to_be_run

[Service]
EnvironmentFile=-/etc/default/ssh
ExecStartPre=/usr/sbin/sshd -t
ExecStart=/usr/sbin/sshd -D $SSHD_OPTS
ExecReload=/usr/sbin/sshd -t
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
RestartPreventExitStatus=255
Type=notify
RuntimeDirectory=sshd
RuntimeDirectoryMode=0755

[Install]
WantedBy=multi-user.target
Alias=sshd.service

The key configuration variable here is “ExecStart”. We need to modify ExecStart so that our sshd starts via ip vrf exec. Do so by creating (or opening, if you already have one!) the override file for sshd:

systemctl edit sshd

This will dump you into the default editor – probably nano unless you changed it – with either your existing override file if you have one, or a blank one if you don’t.

Due to the way systemd sanity checks your unit files, you have to deliberately *unset* ExecStart by first setting it to nothing, then specify the new ExecStart which you can see is the default ExecStart entry, but with

/bin/ip vrf exec Mgmt-VRF

prepended to the start. It’s important to specify the full path to the ip binary as when systemd executes this command, it will more likely than not do so without any PATH variable set, or with a different one to which your shell environment uses. Being explicit with paths ensures everything works as desired. (This is generally a good habit to get into)

If you have a blank file, in our example for sshd all you create is the following:

[Service]
ExecStart=
ExecStart=/bin/ip vrf exec Mgmt-VRF /usr/sbin/sshd -D $SSHD_OPTS

If you don’t have a blank file – well, I expect you know enough about what you’re doing here but if you do not already unset and reset ExecStart (or don’t have a [Service] section at all) then you can simply follow the above. If you’re already overriding ExecStart then you should prepend your override with the same /bin/ip vrf exec Mgmt-VRF

Force systemd to reload the unit files, and restart your service:

systemctl daemon-reload
systemctl restart sshd

That should be it – sshd is now running inside your new VRF; if you have a relatively up to date systemd build it should natively understand VRFs and so can show that it is running inside that vrf (see the CGroup section) – you can also see that it is using our override file as non-overridden services will not have a “Drop-In” section:

systemctl status sshd
● ssh.service - OpenBSD Secure Shell server
   Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/ssh.service.d
           └─override.conf
   Active: active (running) since Wed 2020-08-12 09:38:22 BST; 7h ago
     Docs: man:sshd(8)
           man:sshd_config(5)
 Main PID: 29107 (sshd)
    Tasks: 1 (limit: 4689)
   Memory: 2.8M
   CGroup: /system.slice/ssh.service
           └─vrf
             └─Mgmt-VRF
               └─29107 /usr/sbin/sshd -D

Aug 12 09:38:22 rt3 systemd[1]: Starting OpenBSD Secure Shell server...
Aug 12 09:38:22 rt3 sshd[29107]: Server listening on 10.0.0.2 port 22.
Aug 12 09:38:22 rt3 systemd[1]: Started OpenBSD Secure Shell server.
Aug 12 09:38:50 rt3 sshd[29116]: Accepted password for philb from 192.168.0.2 port 59159 ssh2
Aug 12 09:38:50 rt3 sshd[29116]: pam_unix(sshd:session): session opened for user philb by (uid=0)

Can’t connect?

If you’ve done all this, restarted your service, systemd confirms it’s running in the VRF, and you still can’t connect to it – make sure your service is not trying to bind to an IP that is on an interface in a different VRF to the one in which you started it. Remember that services can only successfully use local IPs that are in the same VRF, even if they start and give the impression of working.

Edit: Persisting VRFs between reboots

I actually forgot about this minor detail when I originally wrote this post – but you soon notice when you reboot and your VRFs are missing.

While I am aware there are probably half a dozen ways to skin this cat, some of which likely including learning how to use systemd-networkd, using systemd to simply execute a bash script at the correct time is by far the quickest solution requiring the least amount of explanation.

First, create a bash script that contains the commands you need to start your VRFs; /sbin/vrf.sh will do, containing, using the above VRF configuration for example:

#!/bin/bash
ip link add Mgmt-VRF type vrf table 2
ip link set dev Mgmt-VRF up
ip route add table 2 0.0.0.0/0 via 10.0.0.1
ip link set dev eth1 master Mgmt-VRF

As this is a script that will get executed as root on system start, make sure this file is owned by, and only writeable by, root! (chmod 700 is fine)

Then create a systemd service that runs this script at the correct time – first you need a service file – in my instance, I created /etc/systemd/system/vrf.service – containing:

[Unit]
Description=VRF creation
Before=network-pre.target
Wants=network-pre.target

[Service]
Type=oneshot
ExecStart=/sbin/vrf.sh

[Install]
WantedBy=multi-user.target

Then enable the service

systemctl enable vrf

You should see something like:

Created symlink /etc/systemd/system/multi-user.target.wants/vrf.service → /etc/systemd/system/vrf.service.

Your VRF(s) should now exist at the correct time during boot for the network services (eg sshd) that need to attach to them.

Compiling and using mk_livestatus on Nagios4 on Debian 10/Buster

Prerequisites (other than the nagios4 packages, of course!):

# apt install rrdtool-dev librrd-dev librrd8 libboost-dev libboost-system-dev

Get latest source from https://checkmk.com/download-source.php, at time of writing, https://checkmk.com/support/1.5.0p23/mk-livestatus-1.5.0p23.tar.gz and unpack

# wget https://checkmk.com/support/1.5.0p23/mk-livestatus-1.5.0p23.tar.gz
# tar -zxvf mk-livestatus-1.5.0p23.tar.gz
# cd mk-livestatus-1.5.0p23

Configure for nagios4, compile and install

# ./configure --with-nagios4 --prefix=/usr/local/nagios && make install

Enable the broker module in Nagios4 – add this to, eg, your nagios.cfg – first make sure that this is set to send all events to the broker:

event_broker_options=-1

Then configure the broker_module – here, telling it to create the socket for livestatus at /var/lib/nagios4/rw/livestatus

broker_module=/usr/local/lib/mk-livestatus/livestatus.o /var/lib/nagios4/rw/livestatus

Now you can restart Nagios4 and test that the livestatus socket is working

# systemctl restart nagios4

# echo "GET status" | /usr/local/bin/unixcat /var/lib/nagios4/rw/livestatus

And you should get something like this:

accept_passive_host_checks;accept_passive_service_checks;cached_log_messages;check_external_commands;check_host_freshness;check_service_freshness;connections;connections_rate;enable_event_handlers;enable_flap_detection;enable_notifications;execute_host_checks;execute_service_checks;external_command_buffer_max;external_command_buffer_slots;external_command_buffer_usage;external_commands;external_commands_rate;forks;forks_rate;host_checks;host_checks_rate;interval_length;last_command_check;last_log_rotation;livecheck_overflows;livecheck_overflows_rate;livechecks;livechecks_rate;livestatus_active_connections;livestatus_queued_connections;livestatus_threads;livestatus_version;log_messages;log_messages_rate;mk_inventory_last;nagios_pid;neb_callbacks;neb_callbacks_rate;num_hosts;num_services;obsess_over_hosts;obsess_over_services;process_performance_data;program_start;program_version;requests;requests_rate;service_checks;service_checks_rate
1;1;0;1;0;1;1;0;1;1;1;1;1;0;0;0;0;0;0;0;57;0.416507;60;0;0;0;0;0;0;1;0;10;1.5.0p23;45;0.0199066;0;4310;1651;11.9169;83;467;0;0;0;1581514195;4.3.4;1;0;348;3.15754

Installing (and Booting) Linux on/FROM Intel vROC NVMe

Just remember to disable Secure Boot (at least, Supermicro’s guide to vROC says that vROC is not compatible with Secure Boot), and ensure that you boot your O/S installer in (U)EFI mode, and make sure you boot in (U)EFI mode afterwards.

Otherwise, expect problems like the CentOS 7 installer complaining that something went wrong as the installer GUI starts (this seems to mostly stem from not seeing the vROC RAID device, but still seeing the member NVMe devices but being confused by the mdraid-esque nature of vROC RAID sets.)

Once you boot the CentOS installer in EFI mode, you’ll be able to see and install to your “BIOS RAID” device. The same will apply to standalone NVMe drives – which on most boards will only work if everything is done in EFI mode.

Nextcloud “Could not load at least one of your enabled two-factor auth methods” after upgrade

Seems that upgrades in Nextcloud have a propensity to break 2FA provider apps as this is apparently something that bit people going to NC15 but in our case got us after we upgraded to NC16

Far as I can tell, what happens is that you have upgraded to the latest NextCloud before a compatible version of your 2FA providers “apps” is available (why you would release without 2FA is beyond me), and so the provider apps get disabled.

When you try to log in, all you’ll see is this:

To fix this, run the following in the nextcloud web root – these sudo commands have to be run as the same UID as the owner of the config file, so if in your environment you aren’t running nextcloud under www-data then you’ll have to adjust the sudo commands as necessary to specify the correct user.

NB: that I’m running these sudo commands as root, so I don’t need any sudo pre-configuration as such to allow me to run these as www-data.

First, identify what 2FA provides your affected user has configured – so, for a user called “adminusername”:

# sudo -u www-data php occ twofactorauth:state adminusername
Two-factor authentication is enabled for user adminusername

Enabled providers:
- totp
- u2f
Disabled providers:
- backup_codes

So, what we see here is this user had both TOTP and U2F (but, tsk, no backup codes – in our experience, twofactor_backup_codes was still working, so a user with backup codes would be able to still log in – you’d still have to understand what to do to fix your install though!)

Check to see if your modules are missing:

# sudo -u www-data php occ app:list | grep twofactor
  - twofactor_backupcodes: 1.5.0

Uh-oh, no twofactor_totp OR twofactor_u2f.

Make sure your nextcloud apps are up to date:

# sudo -u www-data php occ app:update --all

Then re-enable your twofactor provider apps, so for “totp” and “u2f”, you want:

# sudo -u www-data php occ app:enable twofactor_totp
twofactor_totp enabled

# sudo -u www-data php occ app:enable twofactor_u2f
twofactor_u2f enabled

Now you should be able to log back in as normal with your 2FA.

Replace wildcard SSL certificate on all subomains on cPanel server

You’ve got a wildcard certificate you’ve been using on your cPanel server, and you’ve created loads of subdomain SSL accounts, only now your wildcard cert is expiring and there is no trivial way in WHM to replace a wildcard cert and have it replace the cert in use on all the subdomains that use the same original certificate.

Fear not: I have a script for that. I have tried to make the comments useful, so pass no further comment here. This script basically does a two-pair search and replace – so searching and replacing the crt filename, and the key filename.

Hopefully this is useful for someone.

#!/bin/bash
#
# replaceSSL.sh - v0.1 - Phillip Baker, Netcalibre Ltd - phil@lchost.co.uk
#
# WARNING: THIS SCRIPT DIRECTLY MANIPULATES CPANEL METADATA FILES.
# 
# BACK UP /var/cpanel/userdata FIRST IF YOU DECIDE TO USE THIS SCRIPT
# IF SOMETHING GOES WRONG, RESTORE THE /var/cpanel/userdata FILES FROM
# YOUR BACKUP AND CALL /scripts/rebuildinstalledssldb AND YOU SHOULD BE OK
#
# NO WARRANTY, NO GUARANTEES. WORKS FOR ME. MAY NOT WORK FOR YOU.
# DO NOT RUN THIS SCRIPT IF YOU DO NOT UNDERSTAND WHAT IT DOES AND CANNOT
# FIGURE OUT FOR YOURSELF IF IT WILL BREAK SOMETHING.
#
# This script should be run sudo/su as root.
#
# Quickly replace one SSL cert and key file for another already existing on cPanel server
# then rebuild apache config and the ssl.db before restarting apache.
#
# Useful when replacing a wildcard cert on lots of subdomain accounts
#
# Install the new cert on the server using WHM in the usual way on one of the accounts
# (perhaps you install it on the root - example.com - account)
#
# Determine the old cert string you are targeting by looking in a subdomain using
# the old certificate still:
# cat /var/cpanel/userdata/<username>/subdomain.example.com_SSL | grep crt
#
# Will spit out a crt filename that looks a bit like:
# _wildcard_example.com_aad23_12314_124112312_adsfasdfasdfasdfasdf.crt
#
# Determine the new cert string in the same way from the domain you installed the new cert on
# cat /var/cpanel/userdata/<username>/example.com_SSL | grep crt
#
# Do the same for the old / new key (if the key is unchanged, just specify the same string 
# twice)
#
# Then: ./replaceSSL.sh <oldcrt> <newcrt> <oldkey> <newkey>


if [ $# -ne 4 ]; then
    echo "./replaceSSL.sh <oldcrt> <newcrt> <oldkey> <newkey>"
    exit 1
fi

OLDCRT=$1
NEWCRT=$2
OLDKEY=$3
NEWKEY=$4

CRTFOUND=0

# change to the appropriate directory
cd /var/cpanel/userdata/

# For each _SSL file
for i in `find . -iname *_SSL`
do
 if grep -q $OLDCRT $i
 then
  CRTFOUND=1
  echo $i
  # copy to an intermediate file, precrt has the original file before the crt is replaced
  cp $i $i.precrt
  # modify the metadata file to replace the crt filename
  sed "s/$OLDCRT/$NEWCRT/g" $i.precrt > $i
  # copy to another intermediate file, prekey has the file after the crt was replaced 
  # but before the key was replaced
  cp $i $i.prekey
  # modify the metadata file to replace the key filename
  sed "s/$OLDKEY/$NEWKEY/g" $i.prekey > $i
 fi
done

if [[ "$CRTFOUND" -eq 1 ]]
then
 # rebuild apache config
 /scripts/rebuildhttpdconf

 # restart apache
 service httpd restart

 # rebuild ssl.db so that the WHM "Manage SSL hosts" section looks accurate
 /scripts/rebuildinstalledssldb

 # Once you're sure it worked
 echo "Pausing so you can check apache is still working and SSL Manager looks sane"
 echo "Press enter if everything is ok, or press ctrl-c to keep *.precrt/*.prekey files"
 echo "for manual inspection to try and figure out what went wrong."
 read

 # clean up
 for i in `find . -iname *.precrt`
 do
  rm $i
 done
 for i in `find . -iname *.prekey`
 do
  rm $i
 done
else
 echo "Old cert string not found - no changes made!"
fi

“letsencrypt.sh isn’t renewing my certs!”

There’s been a change at some point to the JSON format that Let’s Encrypt returns challenges in.

If you have an “old” installation of letsencrypt.sh that pre-dates v0.2.0, letsencrypt.sh is probably doing this:

$ ./letsencrypt.sh -c
# INFO: Using main config file /home/letsencrypt/letsencrypt.sh/config.sh
Processing somedomain.netcalibre.net
 + Checking domain name(s) of existing cert... unchanged.
 + Checking expire date of existing cert...
 + Valid till Jun 22 16:24:00 2016 GMT (Less than 30 days). Renewing!
 + Signing domains...
 + Generating signing request...
 + Requesting challenge for somedomain.netcalibre.net...
$

and then it silently exits.

Update it from git, move your config file to the new location:

git pull
mv config.sh config
./letsencrypt.sh -c

Using DNS challenge with Let’s Encrypt (and migrating from the official client)

Edited 09/03/2020: Tweaked to reflect the change in the project status and repo change
Edited 23/06/2018: Long overdue update to reflect the fact that the project has changed name from letsencrypt.sh to dehydrated

Edited 30/03/2016: Tweaked to reflect where config.sh.example can be found in letsencrypt.sh’s repo these days
Edited 02/06/2016: Tweaked again to reflect the change from “config.sh” to “config”, and an addition about securing private_key.pem

Let’s be honest – Let’s Encrypt is a great project with lofty ideals, but their official client is just awful bloatware.

I ended up duct-taping functionality I needed by wrapping their client in a shell script that could look at a list of domains and take the appropriate renew actions as required.

With the advent of support for DNS challenges in the Let’s Encrypt infrastructure but apparently no plans to add it to the standard client.. I found myself looking for alternatives.

Enter letsencrypt.sh dehydrated – essentially a 28KB bash wrapper for standard OpenSSL binaries (compare to the 14MB of crap that the official client entails) that doesn’t insist on putting things into /etc/ by default, and includes the functionality to manage a list of hostnames from a text file.

This lightweight client has added support for the DNS-01 challenge mechanism, and calls a hook script during the challenge/completion stage to achieve the actual DNS record changes (and certificate installation) necessary. You can obviously write this hook script in whatever language you see fit. There are plenty of sample hookscripts here.

A chunk of this guide will include the effort required to migrate from an existing Let’s Encrypt install, but you can just skip those bits if you are setting up a “green field” dehydrated deployment.

Getting dehydrated

First, get dehydrated downloaded, copy the sample config file, and start a blank domains.txt file (this file tracks the domains you have/want certs for – more on this later):

git clone https://github.com/dehydrated-io/dehydrated dehydrated
cd dehydrated
cp docs/examples/config config
touch domains.txt

Edit config with your preferred editor; if, like me, you want to use the dns-01 challenge mechanism, make sure you set:

CHALLENGETYPE=”dns-01″

and

HOOK=”$BASEDIR/myhookscript.php”

(or wherever your hook script/binary will be)

You’ll probably also want to set CONTACT_EMAIL to something sensible if you’re setting up a new install and want to be able to recover your Let’s Encrypt ‘account’ if something goes wrong somehow (presumably, if you lose your private key). If you’re importing an existing Let’s Encrypt install and key, the contact email, if you set one, was determined when you created your first certificate with that installation.

Everything else can be left commented out unless you know that you need something specific for your environment.

Migrating from the Let’s Encrypt Python client

Moving from the official client to dehydrated is pretty straightforward with a few migration scripts (source). I offer a copy of the scripts here for expediency and as a backup record in case something happens to the originals, but if the versions here don’t work for you, check the source link to see if there’s an updated version available.

Import existing keys/certs/etc

First, let’s import your existing private keys, Let’s Encrypt issued certs, etc – create a script called import.sh in your dehydrated directory with the following contents:

#!/usr/bin/env bash

set -e
set -u
set -o pipefail

umask 077 # paranoid umask, we're creating private keys

SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
BASEDIR="${SCRIPTDIR}"
LETSENCRYPT="/etc/letsencrypt"

eval "$("${SCRIPTDIR}/dehydrated" --env)"

if [[ ! -e "${LETSENCRYPT}" ]]; then
  echo "No existing letsencrypt files found."
  exit 1
fi

if [[ -e "${BASEDIR}/domains.txt" ]]; then
  DOMAINS_TXT="${BASEDIR}/domains.txt"
elif [[ -e "${SCRIPTDIR}/domains.txt" ]]; then
  DOMAINS_TXT="${SCRIPTDIR}/domains.txt"
else
  echo "You have to create a domains.txt file listing the domains you want certificates for. Have a look at domains.txt.example."
  echo "For the purpose of this import script the file can be empty, but it has to exist."
  exit 1
fi

for certdir in "${LETSENCRYPT}/live/"*; do
  domain="$(basename "${certdir}")"
  echo "Processing ${domain}"

  # Check if we already have a certificate for the same (main) domain
  if [ -e "${BASEDIR}/certs/${domain}" ]; then
    echo " + Skipping: Found existing certificate directory, don't want to delete anything."
    continue
  fi

  # Check if private-key, certificate and fullchain exist
  if [[ ! -e "${certdir}/privkey.pem" ]]; then
    echo " + Skipping: Private key is missing."
    continue
  fi
  if [[ ! -e "${certdir}/cert.pem" ]]; then
    echo " + Skipping: Certificate is missing."
    continue
  fi
  if [[ ! -e "${certdir}/fullchain.pem" ]]; then
    echo " + Skipping: Chain is missing."
    continue
  fi

  # Check if certificate still valid
  if ! openssl x509 -checkend 0 -noout -in "${certdir}/cert.pem" >/dev/null 2>&1; then
    echo " + Skipping: Certificate is expired."
    continue
  fi

  # Import certificate
  timestamp="$(date +%s)"

  echo " + Adding list of domains to ${DOMAINS_TXT}"
  SAN="$(openssl x509 -in "${certdir}/cert.pem" -noout -text | grep -A1 "Subject Alternative Name" | grep "DNS")"
  SAN="${SAN//DNS:/}"
  SAN="${SAN//, / }"
  altnames="${domain}"
  for altname in ${SAN}; do
    if [[ ! "${altname}" = "${domain}" ]]; then
      altnames="${altnames} ${altname}"
    fi
  done
  echo "${altnames}" >> "${DOMAINS_TXT}"

  mkdir -p "${BASEDIR}/certs/${domain}"

  echo " + Importing private key"
  cat "${certdir}/privkey.pem" > "${BASEDIR}/certs/${domain}/privkey-${timestamp}.pem"
  ln -s "privkey-${timestamp}.pem" "${BASEDIR}/certs/${domain}/privkey.pem"

  echo " + Importing certificate"
  cat "${certdir}/cert.pem" > "${BASEDIR}/certs/${domain}/cert-${timestamp}.pem"
  ln -s "cert-${timestamp}.pem" "${BASEDIR}/certs/${domain}/cert.pem"

  echo " + Importing chain"
  cat "${certdir}/fullchain.pem" > "${BASEDIR}/certs/${domain}/fullchain-${timestamp}.pem"
  ln -s "fullchain-${timestamp}.pem" "${BASEDIR}/certs/${domain}/fullchain.pem"
done

and run it to import all your keys/certs/domains into your dehydrated environment.

bash import.sh

Certs, Keys and the like will end up in individual dehydrated/certs/<domain> directories so make sure you update any paths to the certs in your Apache/nginx/Exim/whatever configuration to reflect this.

Importing your Let’s Encrypt account private key

Then to import your private key – create a script called exportkey.pl next to your private_key.json file in your existing Let’s Encrypt installation – by default, you’ll find this in a unique-to-you subdirectory under /etc/letsencrypt/accounts/acme-v01.api.letsencrypt.org/directory/ (eg: /etc/letsencrypt/accounts/acme-v01.api.letsencrypt.org/directory/3ca73eb3d1e1c3ea9123a3ecb278d43e)

NB: You’ll need the perl JSON library for this script to work. ( Debian users can just apt-get install libjson-perl )

#!/usr/bin/env perl

use strict;

use Crypt::OpenSSL::RSA;
use Crypt::OpenSSL::Bignum;
use JSON;
use File::Slurp;
use MIME::Base64;

my $json_file = "private_key.json";
my $json_content = read_file($json_file);
$json_content =~ tr/-/+/;
$json_content =~ tr/_/\//;

my $json = decode_json($json_content);

my $n = Crypt::OpenSSL::Bignum->new_from_bin(decode_base64($json->{n}));
my $e = Crypt::OpenSSL::Bignum->new_from_bin(decode_base64($json->{e}));
my $d = Crypt::OpenSSL::Bignum->new_from_bin(decode_base64($json->{d}));
my $p = Crypt::OpenSSL::Bignum->new_from_bin(decode_base64($json->{p}));
my $q = Crypt::OpenSSL::Bignum->new_from_bin(decode_base64($json->{q}));

my $rsa = Crypt::OpenSSL::RSA->new_key_from_parameters($n, $e, $d, $p, $q);

print($rsa->get_private_key_string());

and execute it:

perl exportkey.pl

This will spit out your private key in the standard PEM format. Cut and paste the output into a new file – private_key.pem – in your dehydrated directory.

Setting up your DNS challenge hook script

With the DNS-01 challenge mechanism, you need some way of creating/deleting DNS records on the fly. If you use a DNS service that provides an API or uses a database back-end, this should be nice and easy for you to implement. If you don’t, well, you’re going to need to figure out that stuff on your own.

If you use one of the DNS services covered by the sample scripts I linked to earlier, you can just download, tweak, and away you go.

If you’re writing your own hook script, your script needs to take arguments on the command line:

$1 = operation
$2 = hostname
$3 = challenge token (not used in DNS-01)
$4 = challenge response

$1 has three possible values indicating the required action from your script:

  • deploy_challenge – create the DNS record
  • clean_challenge – delete the DNS record (cleaning up after the challenge attempt)
  • deploy_cert – deploy the new certificate (you may or may not need to do anything here depending on your environment; such as reload daemons to load the new cert)

The DNS record you need to create/delete is then, effectively:

_acme-challenge.$2. IN TXT “$4”

How best you do this is left as an exercise to you, the reader. If you want to “try out” the DNS challenge mechanism, you can create a manual hook script that simply outputs the record to create/delete and waits for you to press enter to continue – that way you can run dehydrated, the hook script will spit out the DNS record, you can create it, reload your nameserver if necessary, and then press enter in the hookscript when you’re ready to continue:

#!/bin/bash

echo "Method: $1"
echo "_acme-challenge.$2.  IN  TXT \"$4\""

read -s -r -e < /dev/tty

Using dehydrated

NB: If you haven’t imported an existing account key (and perhaps even if you have?) you might need to run the following to accept LEs terms and conditions (read them first!)

./dehydrated --register --accept-terms

Run dehydrated -c once to see what happens (it should check your existing certs and renew any less than 30 days old)

./dehydrated -c

Run a cron, say, once a month, that executes dehydrated -c to refresh any certs that have less than 30 days left on them.

Getting a cert for a new domain

Add your domain to the end of domains.txt followed by any SANs you want, such as the below:

mydomain.com wiki.mydomain.com support.mydomain.com

Remember that combining hostnames into one cert means using the same private key and cert (duh) for all of those hostnames – consider if this is an acceptable risk to you!

Then run

dehydrated -c

to create the request, do the challenge/response – done!

Postscript:

I’ve just realised that on my test systems when migrating, private_key.pem was created world readable. Given this key is used to identify you to Let’s Encrypt, you should definitely consider changing the permissions to only be readable by the user you run dehydrated as:

chmod go-r private_key.pem

Internal IPMI error / Stopping kipmi0

As previously documented on this site, I use Nagios extensively. I’ve used the check_ipmi_sensor plugin for a while now, but have had problems on a Centos 6.6 Supermicro box that I had installed it on.

I’d regularly get hit with failures caused by the IPMI failing to return full payloads, and the freeipmi tools frequently dumping out halfway through execution stating that there was an “Internal IPMI Error” – here’s an example running check_ipmi_sensor from the command line:

# ./check_ipmi_sensor -H localhost -fc 5 -v
ID   | Name            | Type              | State    | Reading    | Units | Event
4    | CPU1 Temp       | Temperature       | Nominal  | 38.00      | C     | 'OK'
71   | CPU2 Temp       | Temperature       | Nominal  | 40.00      | C     | 'OK'
138  | PCH Temp        | Temperature       | Nominal  | 31.00      | C     | 'OK'
205  | System Temp     | Temperature       | Nominal  | 26.00      | C     | 'OK'
272  | Peripheral Temp | Temperature       | Nominal  | 41.00      | C     | 'OK'
339  | Vcpu1VRM Temp   | Temperature       | Nominal  | 33.00      | C     | 'OK'
406  | Vcpu2VRM Temp   | Temperature       | Nominal  | 39.00      | C     | 'OK'
473  | VmemABVRM Temp  | Temperature       | Nominal  | 29.00      | C     | 'OK'
540  | VmemCDVRM Temp  | Temperature       | Nominal  | 26.00      | C     | 'OK'
607  | VmemEFVRM Temp  | Temperature       | Nominal  | 38.00      | C     | 'OK'
674  | VmemGHVRM Temp  | Temperature       | Nominal  | 32.00      | C     | 'OK'
741  | P1-DIMMA1 Temp  | Temperature       | Nominal  | 27.00      | C     | 'OK'
808  | P1-DIMMB1 Temp  | Temperature       | Nominal  | 27.00      | C     | 'OK'
875  | P1-DIMMC1 Temp  | Temperature       | Nominal  | 27.00      | C     | 'OK'
942  | P1-DIMMD1 Temp  | Temperature       | Nominal  | 26.00      | C     | 'OK'
1009 | P2-DIMME1 Temp  | Temperature       | Nominal  | 28.00      | C     | 'OK'
1076 | P2-DIMMF1 Temp  | Temperature       | Nominal  | 29.00      | C     | 'OK'
1143 | P2-DIMMG1 Temp  | Temperature       | Nominal  | 28.00      | C     | 'OK'
1210 | P2-DIMMH1 Temp  | Temperature       | Nominal  | 29.00      | C     | 'OK'
1411 | FAN3            | Fan               | Nominal  | 6500.00    | RPM   | 'OK'
1478 | FAN4            | Fan               | Nominal  | 6400.00    | RPM   | 'OK'
ipmi_sensor_read: internal IPMI error

-> Execution of /usr/sbin/ipmi-sensors failed with return code 1.
-> /usr/sbin/ipmi-sensors was executed with the following parameters:
   sudo /usr/sbin/ipmi-sensors --quiet-cache --sdr-cache-recreate --interpret-oem-data --output-sensor-state --ignore-not-available-sensors

This obviously isn’t especially ideal – and I was seeing this every 10-30 minutes when Nagios was running its regular checks. First glances, it looked like it could be caused by defective hardware, but fortunately, that wasn’t the case – just my checks colliding with kipmid.

Centos 6 has built ipmi_si into the kernel by default – and kipmid starts on boot and starts polling any detected IPMI devices (you’ll probably see [kipmi0] running). Given I’m configuring Nagios for monitoring, I’ve no interest in the kernel helper polling my IPMI and tying it up, and if you’re reading this and you have a kernel ipmid thread, chances are, neither do you. Alternatively, it’s possible you’re reading this because your kipmi0 process is claiming to use 100% of a CPU core, perhaps because you did a bmc cold reboot or similar – in which case, this is probably useful for you too, even if you want to keep kipmid running.

You can stop kipmid in its tracks by hot-removing the IPMI device from it’s list of detected devices; first, get the parameters for the detected device like this:

# cat /proc/ipmi/0/params
kcs,i/o,0xca2,rsp=1,rsi=1,rsh=0,irq=0,ipmb=0

Then, take the output from the above, prefix it with “remove,” and use /sys/module/ipmi_si/parameters/hotmod to remove the device:

# echo "remove,kcs,i/o,0xca2,rsp=1,rsi=1,rsh=0,irq=0,ipmb=0" > /sys/module/ipmi_si/parameters/hotmod

The kipmid thread will be cleaned up immediately without requiring a reboot. freeipmi’s various utilities do not use ipmi_si/kipmid and will continue to work just fine.

If all you wanted to do was restart kipmid for some reason, you could then re-add the device by instead prefixing with “add,”:

# echo "add,kcs,i/o,0xca2,rsp=1,rsi=1,rsh=0,irq=0,ipmb=0" > /sys/module/ipmi_si/parameters/hotmod

Whereupon kipmi0 will resume.

If stopping kipmid fixes your issue (it provided immediate relief in my case), make sure it stays gone by adding the following to your kernel options in /boot/grub/menu.lst (or, wherever your bootloader configuration is, if you’re not using the default grub environment)

ipmi_si.force_kipmid=0

Weirdly, kipmid doesn’t seem to cause problems with Ubuntu 14.04 (on a Dell R200) or Debian 8 (on a different Supermicro board), so perhaps this is a centos specific issue.

Tips for Configuring Nagios3 Efficiently – part 1

Back when I started using Nagios (I think ~1.2 or earlier) I don’t remember many options for being all that efficient in terms of “lines of config written” – certainly, any options for being efficient that there may have been ended up being overlooked in the rush to get it up and running, and I’ve been largely been using the same configuration files (and style) ever since – though I did start using host and service templates as soon as I became aware of them some time back in the 2.x branch days.

In the spirit of self-improvement, I’ve been revisiting the Nagios configuration syntax as part of rolling out a fresh monitoring host based on Nagios3, and have significantly reduced the number of lines of config my Nagios installation depends on as a result.

Continue reading Tips for Configuring Nagios3 Efficiently – part 1