This shows you the differences between two versions of the page.
|
build:monitoring [2010/02/15 16:44] 188.40.110.15 WMfSMmdKznJxbVVDj |
build:monitoring [2010/03/03 15:08] (current) 99.100.133.164 old revision restored |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | gf9Wct <a href="http://jsbjbjhpkmzv.com/">jsbjbjhpkmzv</a>, [url=http://bdklhqlbfurf.com/]bdklhqlbfurf[/url], [link=http://ctiyhobdunxn.com/]ctiyhobdunxn[/link], http://lbehhbtyoalh.com/ | + | ====== Monitoring ====== |
| + | |||
| + | This page documents our monitoring and alerting scripts. | ||
| + | |||
| + | ===== Munin ===== | ||
| + | |||
| + | Munin does not do any alerting, but pulls system data periodically and displays it in RRDTools graphs. Munin comes in 2 pieces: ''munin'' and ''munin-node''. The ''munin-node'' part is a daemon that gathers the data, and the ''munin'' part runs via cron, and aggregates the data from multiple ''munin'' daemons running on various systems. | ||
| + | |||
| + | Installing Munin (both parts) requires a few other libraries; we install it like this: | ||
| + | <code bash> | ||
| + | sudo apt-get install -y munin munin-node rrdtool munin-plugins-extra ethtool | ||
| + | </code> | ||
| + | |||
| + | We then configure ''munin-node'' to only listen on the loopback interface, since we only have the one system, and won't be polling it from any other Munin system: | ||
| + | <code bash> | ||
| + | sudo sed -i /etc/munin/munin-node.conf -e 's/^host /#host /' | ||
| + | sudo sed -i /etc/munin/munin-node.conf -e 's/^# host 127.0.0.1/host 127.0.0.1/' | ||
| + | </code> | ||
| + | |||
| + | We then restart ''munin-node'' with the new settings: | ||
| + | <code bash> | ||
| + | sudo /etc/init.d/munin-node restart | ||
| + | </code> | ||
| + | |||
| + | Next, we configure the ''munin'' collection piece. Since ''use_node_name'' doesn't seem to work, we'll have to tell it the true FQDN of the local host: | ||
| + | <code bash> | ||
| + | HOST_NAME=`hostname -f` | ||
| + | sudo sed -i /etc/munin/munin.conf -e "s/localhost.localdomain/$HOST_NAME/" | ||
| + | </code> | ||
| + | |||
| + | Then we can manually run what would normally get run from cron: | ||
| + | <code bash> | ||
| + | sudo su munin '/usr/bin/munin-cron' | ||
| + | </code> | ||
| + | |||
| + | Finally, we link in the Munin web output into our admin web site. Note that we've [[/build/apache#admin_site | configured Apache]] to allow this. | ||
| + | <code bash> | ||
| + | sudo ln -s /var/www/munin /var/www/admin.boochtek.com/public/munin | ||
| + | </code> | ||
| + | |||
| + | |||
| + | ==== TODO ==== | ||
| + | |||
| + | If we run ''munin-node'' on a system that we'll pull data from remotely, we'll need to edit the ''munin-node.conf'' file accordingly, and also open up TCP port 4949 via Shorewall. | ||
| + | |||
| + | If we pull data from any systems across the Internet, we should enable TLS and certificates. | ||
| + | |||
| + | ===== Schedule Regular Updates ===== | ||
| + | |||
| + | It would be nice to have the updates install automatically, but in order to prevent problems, it's best to have a system administrator apply the updates manually, so he can fix any problems that crop up. So instead, we'll alert the system administrators when there are updates available. | ||
| + | |||
| + | We've adapted code from [[http://wiki.splitbrain.org/debiansnippets#send_mail_on_new_packages | here]] to check for new Debian updates. Save the following code to ''/etc/cron.daily/check-debian-updates'': | ||
| + | <file> | ||
| + | #!/bin/sh | ||
| + | |||
| + | HOSTNAME=`hostname` | ||
| + | MAILTO="craig@boochtek.com" | ||
| + | MAILFROM="Debian update checker <root@boochtek.com>" | ||
| + | |||
| + | apt-get update >/dev/null 2>&1 | ||
| + | |||
| + | NEWPACKAGES=`apt-get --print-uris -qq -y upgrade 2>/dev/null |awk '{print $2}'` | ||
| + | |||
| + | if [ ! -z "$NEWPACKAGES" ] | ||
| + | then | ||
| + | mail -a "From: $MAILFROM" -s "New Packages for $HOSTNAME" $MAILTO <<EOF | ||
| + | There are new Packages available for $HOSTNAME: | ||
| + | |||
| + | $NEWPACKAGES | ||
| + | |||
| + | please run: | ||
| + | apt-get upgrade | ||
| + | as root on $HOSTNAME. | ||
| + | |||
| + | If a package is listed as "held back", then also run: | ||
| + | apt-get dist-upgrade | ||
| + | EOF | ||
| + | fi | ||
| + | |||
| + | exit 0; | ||
| + | </file> | ||
| + | |||
| + | Change the permissions on the script to make it executable: | ||
| + | <code rootshell> | ||
| + | chmod 755 /etc/cron.daily/check-debian-updates | ||
| + | </code> | ||
| + | |||
| + | Adding this script to the ''/etc/cron.daily'' directory will cause it to be run every day. By default, the daily cron scripts run at 6:25 AM. One nice thing about running them daily and sending them to a mailing list is that it's easy to see if the updates have or have not been applied by the next day. The more times the message is sent, the more likely someone will be to log in and run the updates. | ||
| + | |||
| + | Note that there are some packages out there that do this same task -- [[http://packages.debian.org/etch/cron-apt | cron-apt]] and [[http://packages.debian.org/apticron | apticron]] are 2 that I've come across. | ||
| + | |||
| + | ===== Alert on Low Disk Space ===== | ||
| + | |||
| + | This script works much like the previous script, sending an email only if any partition is over 90% full. Save the following code to ''/etc/cron.daily/check-disk-space'': | ||
| + | <file> | ||
| + | #!/bin/sh | ||
| + | |||
| + | |||
| + | HOSTNAME=`hostname` | ||
| + | MAILTO="craig@boochtek.com" | ||
| + | MAILFROM="Drive space checker <root@boochtek.com>" | ||
| + | |||
| + | DF_OUTPUT=`df -h | grep '^/' | sort -r -n -k5 | awk '$5 > "90%" {print " " $6 " is " $5 " full"}'` | ||
| + | |||
| + | if [ ! -z "$DF_OUTPUT" ] | ||
| + | then | ||
| + | mail -a "From: $MAILFROM" -s "Drive space report for $HOSTNAME" $MAILTO <<EOF | ||
| + | Drive space on $HOSTNAME is critical: | ||
| + | |||
| + | $DF_OUTPUT | ||
| + | |||
| + | Please clear up some space on the listed partitions. | ||
| + | |||
| + | EOF | ||
| + | fi | ||
| + | |||
| + | exit 0; | ||
| + | </file> | ||
| + | |||
| + | Change the permissions on the script to make it executable: | ||
| + | <code rootshell> | ||
| + | chmod 755 /etc/cron.daily/check-disk-space | ||
| + | </code> | ||
| + | |||
| + | |||
| + | ===== Alert on Low Swap Space ===== | ||
| + | |||
| + | This script works much like the previous script, sending an email only if any partition is over 90% full. Save the following code to ''/etc/cron.hourly/check-memory'': | ||
| + | <file> | ||
| + | #!/bin/sh | ||
| + | |||
| + | HOSTNAME=`hostname` | ||
| + | MAILTO="craig@boochtek.com" | ||
| + | MAILFROM="Memory checker <root@boochtek.com>" | ||
| + | |||
| + | FREE_OUTPUT=`free | grep -i swap | awk '$4 < 300000 {print " " $4 "KB of swap remaining" }'` | ||
| + | if [ ! -z "$FREE_OUTPUT" ] | ||
| + | then | ||
| + | TOP_OUTPUT=`TERM=dumb /usr/bin/top -b -n 1` | ||
| + | mail -a "From: $MAILFROM" -s "Swap space report for $HOSTNAME" $MAILTO <<EOF | ||
| + | Swap space on $HOSTNAME is critical: | ||
| + | |||
| + | $FREE_OUTPUT | ||
| + | $TOP_OUTPUT | ||
| + | |||
| + | EOF | ||
| + | fi | ||
| + | |||
| + | exit 0; | ||
| + | </file> | ||
| + | |||
| + | Change the permissions on the script to make it executable: | ||
| + | <code rootshell> | ||
| + | chmod 755 /etc/cron.hourly/check-memory | ||
| + | </code> | ||
| + | |||
| + | |||
| + | ===== Root Password Change Reminders ===== | ||
| + | |||
| + | Root passwords should be changed at least every 6 months. | ||
| + | We decided to send out an email reminder to help ensure that we do that. | ||
| + | |||
| + | Save the following code to ''/etc/cron.monthly/root-password-reminder'': | ||
| + | <file> | ||
| + | #!/bin/sh | ||
| + | |||
| + | HOSTNAME=`hostname` | ||
| + | MAILTO="craig@boochtek.com" | ||
| + | MAILFROM="Root password reminder <root@boochtek.com>" | ||
| + | MONTH=`date +'%1m'` | ||
| + | |||
| + | # This checks to see if it is July or January. If so, send out the reminder. | ||
| + | # Since this script is in cron.monthly, it only runs on the 1st of the month. | ||
| + | if [ $MONTH = '07' -o $MONTH = '01' ]; then | ||
| + | mail -a "From: $MAILFROM" -s "Change root password on $HOSTNAME" $MAILTO <<EOF | ||
| + | Please change the root password on $HOSTNAME. | ||
| + | |||
| + | Whoever changes the root password, please reply to this email to | ||
| + | let everyone know that you've changed it. Provide your phone number | ||
| + | so that the other admins can call you to get the new password. | ||
| + | |||
| + | This script is located in /etc/cron.monthly/root-password-reminder, | ||
| + | and send emails out on July 1 and January 1. | ||
| + | EOF | ||
| + | fi | ||
| + | |||
| + | exit 0; | ||
| + | </file> | ||
| + | |||
| + | Change the permissions on the script to make it executable: | ||
| + | <code rootshell> | ||
| + | chmod 755 /etc/cron.monthly/root-password-reminder | ||
| + | </code> | ||
| + | |||
| + | Adding this script to the ''/etc/cron.monthly'' directory will cause it to be run on the 1st day of every month. The script itself checks to see if it's January or July, and only sends an email for those months. By default, the daily cron scripts run at 6:52 AM. | ||
| + | |||
| + | |||
| + | ===== File Integrity Monitoring ===== | ||
| + | |||
| + | We chose fcheck to monitor changes to system files. It's pretty simple -- it just sends an email to root with a list of files that have changed since the last time it was run. | ||
| + | |||
| + | <code bash> | ||
| + | sudo apt-get install fcheck | ||
| + | sudo sed -i -e 's:^TimeZone.*$:TimeZone = America/Chicago:' /etc/fcheck/fcheck.cfg | ||
| + | </code> | ||
| + | |||
| + | By default, fcheck runs from cron every 2 hours. We change it to run every 6 hours instead: | ||
| + | <code bash> | ||
| + | sudo sed -e 's|^30 \*/2|30 */6|' -i /etc/cron.d/fcheck | ||
| + | </code> | ||
| + | |||
| + | ===== TODO ===== | ||
| + | |||
| + | If we're on a non-virtual system, we should also install ''lm-sensors'', ''acpi'', and ''smartmontools''. | ||
| + | |||
| + | Determine if there's any reason to switch from fcheck to Tripwire or something else. | ||
| + | |||
| + | Consider some of the all-in-one host monitoring systems, such as [[http://la-samhna.de/samhain/ | Samhain]] (HIDS, | ||
| + | |||
| + | |||
| + | ===== Credits ===== | ||
| + | |||
| + | Got the motivation to install ''fcheck'' from [[http://debaday.debian.net | Debian Package of the Day]] [[http://debaday.debian.net/2009/08/23/fcheck-easy-to-use-file-integrity-checker/ | article on fcheck]]. | ||