User Tools

Site Tools


build:monitoring

Monitoring

This page documents our monitoring and alerting scripts.

Munin

Munin does not do any alerting, but pulls system data periodically and displays it in RRDTools graphs. Munin comes in 2 pieces: munin and munin-node. The munin-node part is a daemon that gathers the data, and the munin part runs via cron, and aggregates the data from multiple munin daemons running on various systems.

Installing Munin (both parts) requires a few other libraries; we install it like this:

sudo apt-get install -y munin munin-node rrdtool munin-plugins-extra ethtool

We then configure munin-node to only listen on the loopback interface, since we only have the one system, and won't be polling it from any other Munin system:

sudo sed -i /etc/munin/munin-node.conf -e 's/^host /#host /'
sudo sed -i /etc/munin/munin-node.conf -e 's/^# host 127.0.0.1/host 127.0.0.1/'

We then restart munin-node with the new settings:

sudo /etc/init.d/munin-node restart

Next, we configure the munin collection piece. Since use_node_name doesn't seem to work, we'll have to tell it the true FQDN of the local host:

HOST_NAME=`hostname -f`
sudo sed -i /etc/munin/munin.conf -e "s/localhost.localdomain/$HOST_NAME/"

Then we can manually run what would normally get run from cron:

sudo su munin '/usr/bin/munin-cron'

Finally, we link in the Munin web output into our admin web site. Note that we've configured Apache to allow this.

sudo ln -s /var/www/munin /var/www/admin.boochtek.com/public/munin

TODO

If we run munin-node on a system that we'll pull data from remotely, we'll need to edit the munin-node.conf file accordingly, and also open up TCP port 4949 via Shorewall.

If we pull data from any systems across the Internet, we should enable TLS and certificates.

Schedule Regular Updates

It would be nice to have the updates install automatically, but in order to prevent problems, it's best to have a system administrator apply the updates manually, so he can fix any problems that crop up. So instead, we'll alert the system administrators when there are updates available.

We've adapted code from here to check for new Debian updates. Save the following code to /etc/cron.daily/check-debian-updates:

#!/bin/sh

HOSTNAME=`hostname`
MAILTO="craig@boochtek.com"
MAILFROM="Debian update checker <root@boochtek.com>"

apt-get update >/dev/null 2>&1

NEWPACKAGES=`apt-get --print-uris -qq -y upgrade 2>/dev/null |awk '{print $2}'`

if [ ! -z "$NEWPACKAGES" ]
then
 mail -a "From: $MAILFROM" -s "New Packages for $HOSTNAME" $MAILTO <<EOF
There are new Packages available for $HOSTNAME:

$NEWPACKAGES

please run:
 apt-get upgrade
as root on $HOSTNAME.

If a package is listed as "held back", then also run:
 apt-get dist-upgrade
EOF
fi

exit 0;

Change the permissions on the script to make it executable:

chmod 755 /etc/cron.daily/check-debian-updates

Adding this script to the /etc/cron.daily directory will cause it to be run every day. By default, the daily cron scripts run at 6:25 AM. One nice thing about running them daily and sending them to a mailing list is that it's easy to see if the updates have or have not been applied by the next day. The more times the message is sent, the more likely someone will be to log in and run the updates.

Note that there are some packages out there that do this same task – cron-apt and apticron are 2 that I've come across.

Apticron

sudo apt-get install apticron

Alert on Low Disk Space

This script works much like the previous script, sending an email only if any partition is over 90% full. Save the following code to /etc/cron.daily/check-disk-space:

#!/bin/sh


HOSTNAME=`hostname`
MAILTO="craig@boochtek.com"
MAILFROM="Drive space checker <root@boochtek.com>"

DF_OUTPUT=`df -h | grep '^/' | sort -r -n -k5 | awk '$5 > "90%" {print "  " $6 " is " $5 " full"}'`

if [ ! -z "$DF_OUTPUT" ]
then
 mail -a "From: $MAILFROM" -s "Drive space report for $HOSTNAME" $MAILTO <<EOF
Drive space on $HOSTNAME is critical:

$DF_OUTPUT

Please clear up some space on the listed partitions.

EOF
fi

exit 0;

Change the permissions on the script to make it executable:

chmod 755 /etc/cron.daily/check-disk-space

Alert on Low Swap Space

This script works much like the previous script, sending an email only if any partition is over 90% full. Save the following code to /etc/cron.hourly/check-memory:

#!/bin/sh

HOSTNAME=`hostname`
MAILTO="craig@boochtek.com"
MAILFROM="Memory checker <root@boochtek.com>"

FREE_OUTPUT=`free | grep -i swap | awk '$4 < 300000 {print "  " $4 "KB of swap remaining" }'`
if [ ! -z "$FREE_OUTPUT" ]
then
  TOP_OUTPUT=`TERM=dumb /usr/bin/top -b -n 1`
  mail -a "From: $MAILFROM" -s "Swap space report for $HOSTNAME" $MAILTO <<EOF
Swap space on $HOSTNAME is critical:

$FREE_OUTPUT
$TOP_OUTPUT

EOF
fi

exit 0;

Change the permissions on the script to make it executable:

chmod 755 /etc/cron.hourly/check-memory

Root Password Change Reminders

Root passwords should be changed at least every 6 months. We decided to send out an email reminder to help ensure that we do that.

sudo sh -c 'cat > /etc/cron.monthly/root-password-reminder' <<'EOFILE'
#!/bin/sh
 
HOSTNAME=$(hostname)
MAILTO='craig@boochtek.com'
MAILFROM='Root password reminder <root@boochtek.com>'
MONTH=$(date +'%1m')
 
# This checks to see if it is July or January. If so, send out the reminder.
# Since this script is in cron.monthly, it only runs on the 1st of the month.
if [ "$MONTH" = '07' -o "$MONTH" = '01' ]; then
  mail -a "From: $MAILFROM" -s "Change root password on $HOSTNAME" $MAILTO <<EOF
Please change the root password on $HOSTNAME.
 
Whoever changes the root password, please reply to this email to
let everyone know that you've changed it. Provide your phone number
so that the other admins can call you to get the new password.
 
This script is located in /etc/cron.monthly/root-password-reminder,
and send emails out on July 1 and January 1.
EOF
fi
 
exit 0;
EOFILE
 
# Change the permissions on the script to make it executable:
sudo chmod 755 /etc/cron.monthly/root-password-reminder

Adding this script to the /etc/cron.monthly directory will cause it to be run on the 1st day of every month. The script itself checks to see if it's January or July, and only sends an email for those months. By default, the daily cron scripts run at 6:52 AM.

File Integrity Monitoring

We chose fcheck to monitor changes to system files. It's pretty simple – it just sends an email to root with a list of files that have changed since the last time it was run.

# Install the fcheck package.
sudo apt-get install fcheck
 
# Set the display timezone, so times are in our own timezone.
sudo sed -i -e "s|^TimeZone.*\$|TimeZone = $(cat /etc/timezone)|" /etc/fcheck/fcheck.cfg
 
# By default, fcheck runs from cron every 2 hours. We change it to run every 12 hours instead:
sudo sed -e 's|^30 \*/2|30 */12|' -i /etc/cron.d/fcheck

TODO

If we're on a non-virtual system, we should also install lm-sensors, acpi, and smartmontools.

Determine if there's any reason to switch from fcheck to Tripwire or something else.

Consider some of the all-in-one host monitoring systems, such as Samhain (HIDS,

Credits

Got the motivation to install fcheck from Debian Package of the Day article on fcheck.

build/monitoring.txt · Last modified: 2013/04/16 00:25 by Admin