Zabbix – Open Source Monitoring

zabbixLast spring, at a job interview – the interviewer tells me that they are considering using Zabbix to monitor their environment. I was asked what I knew about Zabbix. And unfortunately, the answer was nothing. They told me to go learn it, and come back in a week and so, I did!

The job never materialized, but nonetheless, I am grateful that I was introduced to this product, and want to share with you a little bit about how I am using it… At home!!

For those not familiar with the Product, the headline on their web page describes it as “The Enterprise-class Monitoring Solution for Everyone”. Well – I don’t know about “everyone”, but for those of us that are used to Linux and other open source projects, I think it’s definitely worth considering.

To get it up and going, I fired up a CentOS 6.4 64-bit Linux Virtual Machine. Installed the LAMP stack on top. And off to the races. The getting started guide provided by the product is pretty good, so, I won’t bother repeating it here. I will just give you a few examples of how I am using it in my home environment.

ALARMS

The first thing I would like a monitoring system to tell me – what are the current issues with my servers? Without doing any customization to the agents, I am able to see the current troubles…

issues

GRAPHS

That is a real-time view of what was going on.  But maybe this is normal?  Wouldn’t it be nice if you could see how your box has been performing over time?

Here are a is a graph showing CPU load on the “alpha” virtual machine over the last hour while I was generating load.

alpha-cpu

And a graph showing memory usage on alpha over the same period.

alpha-mem

Or maybe you want to predict when your file system is going to fill up?

saturn-disk

Or how much of the bandwidth you are buying, you are actually using?

saturn-network

NO AGENT?

And if you can’t install an agent on the device – maybe it’s an appliance – or locked down by the vendor. A nice easy “ping”‘ test will at least tell you it’s on the network – or more importantly – when it’s not!

issues2

WEB MONITORING

Another test that I am doing – is my www.paixao.ca available? I created a “scenario” to retrieve my home page to see if is on-line – and even retrieve a 5 MB file so that I can judge how fast (or slow) my hosting provider is.

paixaoca-speed

paixaca-response

NOTIFICATIONS

When a “problem” does show up – what can it do?  It can send you an email. Or if you have a modem attached, and are willing to pay, there is an SMS interface.

notifications

MAPS

My favourite feature those are the maps. Although it takes a little bit of effort to setup, you can create a map that represents your environment that reacts to alarms real time – giving you a graphical view of your environment.

map

You can quickly see what elements are in trouble, and even the relationship between elements. 

OTHER FEATURES

Only so much I can do in my home lab. There’s other features that in time I would like to get the opportunity to play with.

  • SNMP Traps coming from devices into Zabbix.

  • IPMI Interface into the hardware. (The OS doesn’t necessarily know there is a hardware fault – IPMI gets you a view into the hardware.)

YOUR TURN

Are you using Zabbix? Doing anything cool with it? 

ZFS – It’s that Easy

trainsignalI did another article for Train Signal… 

In the Solaris world, we have had access to the ZFS file system for quite a few years. It’s incredibly simple to use and incredibly powerful and flexible. It replaced the need for Solaris DiskSuite and Veritas Volume Manager, and even the UFS and VxFS file systems. Let’s get started with ZFS!

Continue reading on the Train Signal site…

Update
Train Signal was bought out by Plural Sight.
The article is now at the Plural Signal website.

http://blog.pluralsight.com/zfs-it%E2%80%99s-that-simple

SSH Jumpbox

trainsignalI wrote an article for Train Signal showing how to build an SSH Jumpbox to facilitate your job as a sys-admin.

If you are a UNIX sysadmin for any number of servers, you need to build yourself a Linux secure shell (SSH) jumpbox. Do it now! Having a centralized location that you can use to quickly “jump” to any box saves a whole bunch of time. Not only that, it opens opportunities for speeding up repetitive chores, and even automating tasks.

Continue reading at Train Signal Website…

Update
Train Signal was bought out by Plural Sight.
The article is now at the Plural Signal website.

http://blog.pluralsight.com/linux-ssh-jumpbox

Who’s listening on that port? (Linux vs Solaris)

You install a new application… You try to start it up…

But it fails to bind to the port it needs…

Now – what in the world has tied up that port?

In Linux, netstat provides you that info…

# netstat -lnp | egrep "Local Address|^tcp|^udp"
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      2983/sshd
tcp        0      0 127.0.0.1:6010              0.0.0.0:*                   LISTEN      3266/0

But in Solaris, it’s not quite so easy… You’re going to have to work for it…

The following script gives you more or less the same thing…

# printf " PORT | INTERFACE       |   PID | FILEn"
printf "------+-----------------+-------+-------------------------------n"
for PID in `ps -ef | grep -v PID | awk '{print $2}' |  grep -v "^0$"`
do
  pfiles $PID 2> /dev/null | nawk '
  NR==1 {sub(/:/,"",$1); PID=$1; PROC=$2}
  /sockname:.*port:.*[0-9][0-9]/  { P2=$NF; IF2=$3; getline;
  if (!/peername/) {PORT=P2;IF=(IF2=="::"?"ALL":IF2)}}
  END { if (PORT != 0) printf ("%5d | %-15s | %5d | %sn",PORT,IF,PID,PROC) }'
done | sort -n
PORT | INTERFACE       |   PID | FILE
------+-----------------+-------+-------------------------------
22 | 172.16.200.2    |   329 | /usr/lib/ssh/sshd
80 | ALL             |   604 | /usr/local/apache2/bin/httpd
80 | ALL             |   608 | /usr/local/apache2/bin/httpd
80 | ALL             |   609 | /usr/local/apache2/bin/httpd
80 | ALL             |   610 | /usr/local/apache2/bin/httpd
80 | ALL             |   611 | /usr/local/apache2/bin/httpd
80 | ALL             |   612 | /usr/local/apache2/bin/httpd
6010 | 127.0.0.1       |   625 | /usr/lib/ssh/sshd

Looping in the shell… “for” is your friend.

Why repeat the same command 10 times when you can put it in a loop??

Put “for” to use… Here are a few examples:

* Kill off all processes for a specific application..

for PID in  `ps -ef | grep app_name | grep -v grep | awk '{print $2}'`
do
  kill -9 ${PID}
done

* Lock out out all the users in a specific group…

GROUP=`grep group_id /etc/group | awk -F":" '{print $3}'`
for USER in `grep ":${GROUP}:" /etc/passwd | awk -F":" '{print $1}'`
do
  passwd -l ${USER}
done

* Untar a bunch of files…

gunzip *.tar.gz
for TARBALL in `ls -1 *.tar`
do
  tar xf ${TARBALL}
done

I could go on…