Monday, June 11, 2012

[part I - Architecture and configuration] Using OpsView Core to check RPO through DPM monitoring

I have been using Opsview (open source monitoring application) to monitor all kind of things. From datacentre temperature to server health, switch interfaces, UPSs load, etc. This monitoring application is really worthwhile.

This is the problem: i use Microsoft DPM 2010 (System Center Data Protection Manager) to make my short-term (to disk) and long-term backups (to tape). This is a great application but it keeps sending me dozens of emails about problems and their recovery. I really don't care, especially at 3:00 am, if there is a problem with a recovery point that did not succeed. What i want to know is if my RPO (recovery point objective) is being met.

So, here is how i solved it.

1. Architecture

Architecture of "RPO monitoring with Opsview Core and DPM"
The solution relies on check_nrpe plugin service check (on the monitoring server side) that invokes (on the client agent side) a custom made Powershell script through the "external scripts" functionality of the Opsview core agent.
...Sounds confusing??? It really is simple.

Let's assume that you already have the Opsview Core Monitoring server up and running. If you don't you can check out Installation Guide - Ubuntu Linux
If i succeeded to install the whole system without any special linux/ knowledge, so can you.

For this example i assume the following:
  • The Opsview Monitoring Server IP: 192.168.1.1
  • The DPM Server (Opsview Agent) IP: 192.168.1.20 

2. Step-by-step instructions

2.1. Install and configure Opsview agent (initial configuration)

Install the Opsview Core agent for Windows.
Yes, i have tried a couple of other options installing more recent versions of the NSClient (the monitoring daemon that Opsview Windows Agent relies on...). But, if you want my advice, stick to the Opsview Core agent available for download - it'll suite your needs and work just fine!!!
In my case i installed the 64bit version of the Windows agent, which is straight forward (like the 32bit... :)). After finishing installation, go to the C:\Program Files\Opsview Agent folder and edit the NSC.ini file and:
  • edit the "allowed_hosts" variable value, on the "Settings" section, to the IP address of the Opsview Core Monitoring server. This will add a little extra security preventing the agent from answering requests from not allowed sources. Restart the NSClientpp service to apply changes.

[Settings]
allowed_hosts=192.168.1.1

2.2. Run Opsview agent in test mode

While you're doing the initial configurations and testing it's better if you run the agent, from a command prompt, in test mode. This will allow you to view, in real time, all requests, responses and errors of the agent. To do this you should:

  • run the services snap-in and stop the NSClientpp service
  • launch a command prompt window and go to the C:\Program Files\Opsview Agent folder.
  • run the agent in test mode entering the following command
nsclient++ -test

2.3 Test connectivity between Opsview Monitoring Server and Opsview agent

Before going into more complex scenario verify that you can make a request to the agent and that it returns the proper response.

  • open a SSH session (for ex. with putty) and connect to the Opsview Monitoring Server.
  • go to the /usr/local/nagios/libexec$ folder
  • execute a simple cpu check by running the following command line:
./check_nrpe -H 192.168.1.20 -c CheckCPU -a warn=80 crit=90 time=20m time=10s time=4

  • on the client agent side, if you got the agent running in test mode, you should see the injected request and the agent's result
  • on the server side, you should now see the response returned.
Ok. The communication between server and agent is in place e and is working.


Don't miss the second part of this article. I hope i can publish it later this week.
Feel free to leave any questions or suggestions.


No comments:

Post a Comment