Compiling and using mk_livestatus on Nagios4 on Debian 10/Buster

Prerequisites:

# apt install rrdtool-dev librrd-dev librrd8 libboost-dev libboost-system-dev

Get latest source from https://checkmk.com/download-source.php, at time of writing, https://checkmk.com/support/1.5.0p23/mk-livestatus-1.5.0p23.tar.gz and unpack

# wget https://checkmk.com/support/1.5.0p23/mk-livestatus-1.5.0p23.tar.gz
# tar -zxvf mk-livestatus-1.5.0p23.tar.gz
# cd mk-livestatus-1.5.0p23

Configure for nagios4, compile and install

# ./configure --with-nagios4 --prefix=/usr/local/nagios && make install

Enable the broker module in Nagios4 – add this to, eg, your nagios.cfg – first make sure that this is set to send all events to the broker:

event_broker_options=-1

Then configure the broker_module – here, telling it to create the socket for livestatus at /var/lib/nagios4/rw/livestatus

broker_module=/usr/local/lib/mk-livestatus/livestatus.o /var/lib/nagios4/rw/livestatus

Now you can restart Nagios4 and test that the livestatus socket is working

# systemctl restart nagios4

# echo "GET status" | /usr/local/bin/unixcat /var/lib/nagios4/rw/livestatus

And you should get something like this:

accept_passive_host_checks;accept_passive_service_checks;cached_log_messages;check_external_commands;check_host_freshness;check_service_freshness;connections;connections_rate;enable_event_handlers;enable_flap_detection;enable_notifications;execute_host_checks;execute_service_checks;external_command_buffer_max;external_command_buffer_slots;external_command_buffer_usage;external_commands;external_commands_rate;forks;forks_rate;host_checks;host_checks_rate;interval_length;last_command_check;last_log_rotation;livecheck_overflows;livecheck_overflows_rate;livechecks;livechecks_rate;livestatus_active_connections;livestatus_queued_connections;livestatus_threads;livestatus_version;log_messages;log_messages_rate;mk_inventory_last;nagios_pid;neb_callbacks;neb_callbacks_rate;num_hosts;num_services;obsess_over_hosts;obsess_over_services;process_performance_data;program_start;program_version;requests;requests_rate;service_checks;service_checks_rate
1;1;0;1;0;1;1;0;1;1;1;1;1;0;0;0;0;0;0;0;57;0.416507;60;0;0;0;0;0;0;1;0;10;1.5.0p23;45;0.0199066;0;4310;1651;11.9169;83;467;0;0;0;1581514195;4.3.4;1;0;348;3.15754

Installing (and Booting) Linux on/FROM Intel vROC NVMe

Just remember to disable Secure Boot (at least, Supermicro’s guide to vROC says that vROC is not compatible with Secure Boot), and ensure that you boot your O/S installer in (U)EFI mode, and make sure you boot in (U)EFI mode afterwards.

Otherwise, expect problems like the CentOS 7 installer complaining that something went wrong as the installer GUI starts (this seems to mostly stem from not seeing the vROC RAID device, but still seeing the member NVMe devices but being confused by the mdraid-esque nature of vROC RAID sets.)

Once you boot the CentOS installer in EFI mode, you’ll be able to see and install to your “BIOS RAID” device. The same will apply to standalone NVMe drives – which on most boards will only work if everything is done in EFI mode.

Nextcloud “Could not load at least one of your enabled two-factor auth methods” after upgrade

Seems that upgrades in Nextcloud have a propensity to break 2FA provider apps as this is apparently something that bit people going to NC15 but in our case got us after we upgraded to NC16

Far as I can tell, what happens is that you have upgraded to the latest NextCloud before a compatible version of your 2FA providers “apps” is available (why you would release without 2FA is beyond me), and so the provider apps get disabled.

When you try to log in, all you’ll see is this:

To fix this, run the following in the nextcloud web root – these sudo commands have to be run as the same UID as the owner of the config file, so if in your environment you aren’t running nextcloud under www-data then you’ll have to adjust the sudo commands as necessary to specify the correct user.

NB: that I’m running these sudo commands as root, so I don’t need any sudo pre-configuration as such to allow me to run these as www-data.

First, identify what 2FA provides your affected user has configured – so, for a user called “adminusername”:

# sudo -u www-data php occ twofactorauth:state adminusername
Two-factor authentication is enabled for user adminusername

Enabled providers:
- totp
- u2f
Disabled providers:
- backup_codes

So, what we see here is this user had both TOTP and U2F (but, tsk, no backup codes – in our experience, twofactor_backup_codes was still working, so a user with backup codes would be able to still log in – you’d still have to understand what to do to fix your install though!)

Check to see if your modules are missing:

# sudo -u www-data php occ app:list | grep twofactor
  - twofactor_backupcodes: 1.5.0

Uh-oh, no twofactor_totp OR twofactor_u2f.

Make sure your nextcloud apps are up to date:

# sudo -u www-data php occ app:update --all

Then re-enable your twofactor provider apps, so for “totp” and “u2f”, you want:

# sudo -u www-data php occ app:enable twofactor_totp
twofactor_totp enabled

# sudo -u www-data php occ app:enable twofactor_u2f
twofactor_u2f enabled

Now you should be able to log back in as normal with your 2FA.

Configuring TACACS+ authentication and accounting on IOS 15

Just the bare minimum:

! you probably have this already, if you don't; you should read up on it first
aaa new-model

! use local users, and then all tacacs+ servers, to authenticate logins 
aaa authentication login default local group tacacs+ 

! give enable to tacacs+ users 
aaa authentication enable default group tacacs+ 

! send accounting records for when logins ('exec mode') begin and end 
aaa accounting exec default start-stop group tacacs+
 
! send accounting records for config commands 
aaa accounting commands 15 default stop-only group tacacs+ 

! send accounting records for outgoing connections made to other systems 
aaa accounting connection default start-stop group tacacs+ 

! send system event account records (reloads etc) 
aaa accounting system default start-stop group tacacs+ 

! OPTIONAL: On a router with multiple interfaces that could be chosen to
! reach the TACACS server it is best to specify one; we use Loopback addresses
! for iBGP peering, so it makes sense to use them here too
ip tacacs source-interface Loopback0 

! define at least one tacacs server with some friendly $SERVERNAME 
tacacs server $SERVERNAME
   ! Set the TACACS+ server's ipv4 $ADDRESS (or ipv6, adjust accordingly)
   address ipv4 $ADDRESS
   ! Set the encryption $KEY to match the key configured on the TACACS+ server for this device
   key $KEY
!

Now: BEFORE you log off, try to log in again and make sure you can still log in with your original local credentials.

If you can no longer login after making the above changes, you’ll need to fix that first before you disconnect to prevent you locking yourself out.

vSphere Client 5.1 plugins & search: Could not create SSL\TLS secure channel

If you can’t download vSphere Client 5.1 Plugins (eg vShield), and can’t use the search in the client because of:

An unknown connection error occured. (The request failed due to an SSL Error. (The request was aborted. Could not create SSL\TLS secure channel.)

And https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2114357 is of no help (you already allow all SSL.Versions), and your SSL certs don’t appear to be broken or expired, then you’ve probably been bitten by some recent changes in a windows update that’s evidently changed some defaults around the minimum DH key size.

Create the following key; everything will start working immediately (you’ll need to re-enable any disabled vSphere Client plugins) as you will start permitting 512bit DH keys.

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\KeyExchangeAlgorithms\Diffie-Hellman]
"ClientMinKeyBitLength"=dword:00000200

You should consider upgrading to newer versions of vSphere, but then if we all sat around doing things as complicated as that, we’d not have time for any actual work, would we.

“letsencrypt.sh isn’t renewing my certs!”

There’s been a change at some point to the JSON format that Let’s Encrypt returns challenges in.

If you have an “old” installation of letsencrypt.sh that pre-dates v0.2.0, letsencrypt.sh is probably doing this:

$ ./letsencrypt.sh -c
# INFO: Using main config file /home/letsencrypt/letsencrypt.sh/config.sh
Processing somedomain.netcalibre.net
 + Checking domain name(s) of existing cert... unchanged.
 + Checking expire date of existing cert...
 + Valid till Jun 22 16:24:00 2016 GMT (Less than 30 days). Renewing!
 + Signing domains...
 + Generating signing request...
 + Requesting challenge for somedomain.netcalibre.net...
$

and then it silently exits.

Update it from git, move your config file to the new location:

git pull
mv config.sh config
./letsencrypt.sh -c

Internal IPMI error / Stopping kipmi0

As previously documented on this site, I use Nagios extensively. I’ve used the check_ipmi_sensor plugin for a while now, but have had problems on a Centos 6.6 Supermicro box that I had installed it on.

I’d regularly get hit with failures caused by the IPMI failing to return full payloads, and the freeipmi tools frequently dumping out halfway through execution stating that there was an “Internal IPMI Error” – here’s an example running check_ipmi_sensor from the command line:

# ./check_ipmi_sensor -H localhost -fc 5 -v
ID   | Name            | Type              | State    | Reading    | Units | Event
4    | CPU1 Temp       | Temperature       | Nominal  | 38.00      | C     | 'OK'
71   | CPU2 Temp       | Temperature       | Nominal  | 40.00      | C     | 'OK'
138  | PCH Temp        | Temperature       | Nominal  | 31.00      | C     | 'OK'
205  | System Temp     | Temperature       | Nominal  | 26.00      | C     | 'OK'
272  | Peripheral Temp | Temperature       | Nominal  | 41.00      | C     | 'OK'
339  | Vcpu1VRM Temp   | Temperature       | Nominal  | 33.00      | C     | 'OK'
406  | Vcpu2VRM Temp   | Temperature       | Nominal  | 39.00      | C     | 'OK'
473  | VmemABVRM Temp  | Temperature       | Nominal  | 29.00      | C     | 'OK'
540  | VmemCDVRM Temp  | Temperature       | Nominal  | 26.00      | C     | 'OK'
607  | VmemEFVRM Temp  | Temperature       | Nominal  | 38.00      | C     | 'OK'
674  | VmemGHVRM Temp  | Temperature       | Nominal  | 32.00      | C     | 'OK'
741  | P1-DIMMA1 Temp  | Temperature       | Nominal  | 27.00      | C     | 'OK'
808  | P1-DIMMB1 Temp  | Temperature       | Nominal  | 27.00      | C     | 'OK'
875  | P1-DIMMC1 Temp  | Temperature       | Nominal  | 27.00      | C     | 'OK'
942  | P1-DIMMD1 Temp  | Temperature       | Nominal  | 26.00      | C     | 'OK'
1009 | P2-DIMME1 Temp  | Temperature       | Nominal  | 28.00      | C     | 'OK'
1076 | P2-DIMMF1 Temp  | Temperature       | Nominal  | 29.00      | C     | 'OK'
1143 | P2-DIMMG1 Temp  | Temperature       | Nominal  | 28.00      | C     | 'OK'
1210 | P2-DIMMH1 Temp  | Temperature       | Nominal  | 29.00      | C     | 'OK'
1411 | FAN3            | Fan               | Nominal  | 6500.00    | RPM   | 'OK'
1478 | FAN4            | Fan               | Nominal  | 6400.00    | RPM   | 'OK'
ipmi_sensor_read: internal IPMI error

-> Execution of /usr/sbin/ipmi-sensors failed with return code 1.
-> /usr/sbin/ipmi-sensors was executed with the following parameters:
   sudo /usr/sbin/ipmi-sensors --quiet-cache --sdr-cache-recreate --interpret-oem-data --output-sensor-state --ignore-not-available-sensors

This obviously isn’t especially ideal – and I was seeing this every 10-30 minutes when Nagios was running its regular checks. First glances, it looked like it could be caused by defective hardware, but fortunately, that wasn’t the case – just my checks colliding with kipmid.

Centos 6 has built ipmi_si into the kernel by default – and kipmid starts on boot and starts polling any detected IPMI devices (you’ll probably see [kipmi0] running). Given I’m configuring Nagios for monitoring, I’ve no interest in the kernel helper polling my IPMI and tying it up, and if you’re reading this and you have a kernel ipmid thread, chances are, neither do you. Alternatively, it’s possible you’re reading this because your kipmi0 process is claiming to use 100% of a CPU core, perhaps because you did a bmc cold reboot or similar – in which case, this is probably useful for you too, even if you want to keep kipmid running.

You can stop kipmid in its tracks by hot-removing the IPMI device from it’s list of detected devices; first, get the parameters for the detected device like this:

# cat /proc/ipmi/0/params
kcs,i/o,0xca2,rsp=1,rsi=1,rsh=0,irq=0,ipmb=0

Then, take the output from the above, prefix it with “remove,” and use /sys/module/ipmi_si/parameters/hotmod to remove the device:

# echo "remove,kcs,i/o,0xca2,rsp=1,rsi=1,rsh=0,irq=0,ipmb=0" > /sys/module/ipmi_si/parameters/hotmod

The kipmid thread will be cleaned up immediately without requiring a reboot. freeipmi’s various utilities do not use ipmi_si/kipmid and will continue to work just fine.

If all you wanted to do was restart kipmid for some reason, you could then re-add the device by instead prefixing with “add,”:

# echo "add,kcs,i/o,0xca2,rsp=1,rsi=1,rsh=0,irq=0,ipmb=0" > /sys/module/ipmi_si/parameters/hotmod

Whereupon kipmi0 will resume.

If stopping kipmid fixes your issue (it provided immediate relief in my case), make sure it stays gone by adding the following to your kernel options in /boot/grub/menu.lst (or, wherever your bootloader configuration is, if you’re not using the default grub environment)

ipmi_si.force_kipmid=0

Weirdly, kipmid doesn’t seem to cause problems with Ubuntu 14.04 (on a Dell R200) or Debian 8 (on a different Supermicro board), so perhaps this is a centos specific issue.

Tips for Configuring Nagios3 Efficiently – part 1

Back when I started using Nagios (I think ~1.2 or earlier) I don’t remember many options for being all that efficient in terms of “lines of config written” – certainly, any options for being efficient that there may have been ended up being overlooked in the rush to get it up and running, and I’ve been largely been using the same configuration files (and style) ever since – though I did start using host and service templates as soon as I became aware of them some time back in the 2.x branch days.

In the spirit of self-improvement, I’ve been revisiting the Nagios configuration syntax as part of rolling out a fresh monitoring host based on Nagios3, and have significantly reduced the number of lines of config my Nagios installation depends on as a result.

Continue reading Tips for Configuring Nagios3 Efficiently – part 1

The LCHost Debian package mirror

As part of giving back to the community LCHost runs a Debian package mirror (including backports) (for i386 and amd64) which was recently added to the official mirrors list.

I spent a little time making it automatically detect changes at our source mirror within a couple of minutes and pull down, so at most it’ll never be more than 5 minutes behind the top-level mirrors.

To use our debian mirror, simply replace your regular mirror definition in /etc/apt/sources.list with:

deb http://mirror.lchost.net/debian/ stable main contrib
deb-src http://mirror.lchost.net/debian/ stable main contrib

(You may choose to use the release name in place of “stable” to ensure you never accidentally go between major releases – so for wheezy, simply replace the word stable with wheezy)

If you use backports, you can get those packages from us too, using something like:

deb http://mirror.lchost.net/debian/ wheezy-backports main

Installing Nagios3 on Debian Wheezy

It’s pretty straightforward to install Nagios on a Debian system but if you want to be able to use the web interface to control the nagios process a little more work is required.

Starting with a blank slate (apt/dpkg will ensure any required prerequisites will be installed):

# apt-get install nagios3 apache2-suexec

You’ll be asked to set a password for the nagiosadmin user for the web interface.

Enable check_external_commands in Nagios to enable the ability to mute alarms, make comments, restart the nagios process etc from the web interface (pretty much invaluable, but be aware of the inherent risks in enabling the ability to influence the process from “outside”)

# sed -i -e 's/check_external_commands=0/check_external_commands=1/' /etc/nagios3/nagios.cfg
# /etc/init.d/nagios3 restart

Edit the nagios3 apache2 config include to make the web interface scripts run as the nagios user so that the web interface can write to the nagios command pipe; inserting the following at the top of /etc/nagios3/apache2.conf:

User nagios
Group nagios

Restart apache..

# /etc/init.d/apache2 restart

And you’re pretty much done! You can go to http://YOUR_HOST_NAME/nagios3/ and log in with your nagiosadmin password you set up when prompted at the start of this process.

Now, you can get started with creating host and service configuration files in /etc/nagios3/conf.d/ to monitor your servers/network/etc