Create custom Nagios probes
There are two kinds of probes:
- active: executed directly from Nagios server to the monitored host;
- passive: "embedded" on an active probe; used to run commands on a WN using an active probe which submits a grid job to a CE.
Example of writing an active probe: eu.wenmr.lcg-tags_probe
eu.wenmr.lcg-tags_probe gets software tags of a CE.
Steps:
1) Define the probe in the file grid-monitor03:/usr/lib/perl5/vendor_perl/5.8.5/NCG/LocalMetrics/Hash.pm:
# lcg-tags probe
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{native} = "Nagios";
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{probe} ='wenmr/lcg-tags-probe';
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{metricset} = "wenmr";
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{config}->{path} = $NCG::NCG_PROBES_PATH_GRIDMON;
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{config}->{interval} = 5;
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{config}->{timeout} = 30;
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{config}->{retryInterval} = 3;
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{config}->{maxCheckAttempts} = 3;
#$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{flags}->{NOHOSTNAME} = 1;
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{flags}->{PNP} = 1;
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{flags}->{VO} = 1;
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_sprobe'}->{flags}->{OBSESS} = 1;
# $WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{docurl} = "https://twiki.cern.ch/twiki/bin/view/LCG/SAMProbesMetrics#CE";
#$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{dependency}->{"hr.srce.GRAM-Command"} = 1;
#$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{dependency}->{"hr.srce.GridProxy-Valid"} = 0;
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{attribute}->{VONAME} = "--vo";
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{attribute}->{X509_USER_PROXY} = "-x";
#$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{attribute}->{VO_FQAN} = "--vo-fqan";
#$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{flags}->{NOLBNODE} = 1;
#$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{flags}->{NRPE} = 1;
where wenmr/lcg-tags-probe is the name of the script containing the commands to be executed to test the monitored host.
See
https://tomtools.cern.ch/confluence/display/SAM/NCG#NCG-Flags
for details on the meaning of the flags.
2) Associate the probe to the services it will test (in this example CREAM-CE and lcg-CE); this is defined in grid-monitor03:/usr/lib/perl5/vendor_perl/5.8.5/NCG/LocalMetrics/Hash.pm:
$WLCG_NODETYPE->{vo}->{'CREAM-CE'} = [
'org.sam.CREAMCE-JobState',
'org.sam.CREAMCE-JobSubmit',
...
'eu.wenmr.lcg-tags_probe',
...
];
$WLCG_NODETYPE->{vo}->{CE} = [
'hr.srce.GRAM-Auth',
'org.sam.CE-JobState',
'org.sam.CE-JobSubmit',
...
'eu.wenmr.lcg-tags_probe',
...
];
3) Write the commands to be executed to test the monitored host inside the perl script /usr/libexec/grid-monitoring/probes/wenmr/lcg-tags-probe (/usr/libexec/grid-monitoring/probes/ is the default directory for Nagios active probes, while wenmr/lcg-tags-probe is defined in Hash.pm as described above)
Here's the perl script of this example
The command executed is:
`X509_USER_PROXY=$proxy; /opt/glite/bin/lcg-tags --ce $ce --vo $vo --list`
A Nagios structure is used for input:
my $ce=$plugin->opts->get('H');
my $vo=$plugin->opts->get('vo');
my $proxy=$plugin->opts->get('x');
where:
- $proxy is the proxy used by Nagios ($WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{attribute}->{X509_USER_PROXY} = "-x";)
- $ce is the node to be tested (#$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{flags}->{NOHOSTNAME} = 1;)
- $vo is the VO used by Nagios ($WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{attribute}->{VONAME} = "--vo";)
That structure is also used for the output shown in Nagios web pages:
$plugin->nagios_exit(OK,"$msgOut");
$plugin->nagios_exit(WARNING, "No tags found.");
Example of writing a passive probe: eu.wenmr.WN-csRosetta
eu.wenmr.WN-csRosetta runs csRosetta software into a WN
Steps:
1) Define the probe in the file grid-monitor03:/usr/lib/perl5/vendor_perl/5.8.5/NCG/LocalMetrics/Hash.pm:
$WLCG_SERVICE->{'eu.wenmr.WN-csRosetta'}->{flags}->{PASSIVE} = 1;
$WLCG_SERVICE->{'eu.wenmr.WN-csRosetta'}->{parent} = "org.sam.CE-JobState";
$WLCG_SERVICE->{'eu.wenmr.WN-csRosetta'}->{flags}->{VO} = 1;
$WLCG_SERVICE->{'eu.wenmr.WN-csRosetta'}->{flags}->{OBSESS} = 1;
$WLCG_SERVICE->{'eu.wenmr.WN-csRosetta'}->{metricset} = 'org.sam.WN';
$WLCG_SERVICE->{'eu.wenmr.WN-csRosetta'}->{docurl} = "https://twiki.cern.ch/twiki/bin/view/LCG/SAMProbesMetrics#WN";
where org.sam.CE-JobState is the active probe that submits a grid job to a CE.
2) Associate the probe to the services it will test (in this example CREAM-CE and lcg-CE); this is defined in grid-monitor03:/usr/lib/perl5/vendor_perl/5.8.5/NCG/LocalMetrics/Hash.pm:
$WLCG_NODETYPE->{vo}->{'CREAM-CE'} = [
'org.sam.CREAMCE-JobState',
'org.sam.CREAMCE-JobSubmit',
...
'eu.wenmr.WN-csRosetta',
...
];
$WLCG_NODETYPE->{vo}->{CE} = [
'hr.srce.GRAM-Auth',
'org.sam.CE-JobState',
'org.sam.CE-JobSubmit',
...
'eu.wenmr.WN-csRosetta',
...
];
3) Edit grid-monitor03:/usr/lib/perl5/vendor_perl/5.8.5/NCG/LocalMetrics/Hash.pm, adding the path where the probe is located into the definition of the active probe org.sam.CE-JobState
In this example the probe is located in /usr/libexec/grid-monitoring/probes/wenmr/wnjob/
For CREAM-CE:
## CREAM CE
# org.sam.CREAMCE-JobState : [active+passive] submits grid job to CE, holds a status of the grid job
$WLCG_SERVICE->{'org.sam.CREAMCE-JobState'}->{native} = "Nagios";
...
$WLCG_SERVICE->{'org.sam.CREAMCE-JobState'}->{parameter}->{"--add-wntar-nag"} = '/usr/libexec/grid-monitoring/probes/cadist/wnjob/,/var/lib/gridprobes-cadist/,/usr/libexec/grid-monitoring/probes/wenmr/wnjob/';
For lcg-CE:
# org.sam.CE-JobState : [active+passive] submits grid job to CE, holds a status of the grid job
$WLCG_SERVICE->{'org.sam.CE-JobState'}->{native} = "Nagios";
...
$WLCG_SERVICE->{'org.sam.CE-JobState'}->{parameter}->{"--add-wntar-nag"} = '/usr/libexec/grid-monitoring/probes/cadist/wnjob/,/var/lib/gridprobes-cadist/,/usr/libexec/grid-monitoring/probes/wenmr/wnjob/';
Use a custom directory, not the Nagios default ones (ex.:/usr/libexec/grid-monitoring/probes/org.sam) in order to avoid loosing custom probes when updating org.sam package
4) Create the structure requested by Nagios (as written in
https://tomtools.cern.ch/confluence/display/SAM/CE#CE-IntegrationofthirdpartyWNchecks
) under the directory used in step 3.
In this example:
/usr/libexec/grid-monitoring/probes/wenmr/wnjob/
|-- etc
| `-- wn.d
| `-- wenmr
| |-- commands.cfg
| `-- services.cfg
`-- probes
`-- wenmr
`-- csRosetta
where commands.cfg and services.cfg are configuration files:
# cat etc/wn.d/wenmr/services.cfg
...
define service{
use sam-generic-wn-active
service_description eu.wenmr.WN-csRosetta-<VOMS>
check_command csRosetta
}
...
# cat etc/wn.d/wenmr/commands.cfg
...
define command {
command_name csRosetta
command_line $USER3$/wenmr/csRosetta
}
...
[root@grid-monitor03 wnjob]#
while probes/wenmr/csRosetta is a perl script containing the commands that will be really executed in the WN.
Like for active probes, even this script requires a Nagios structure for output messages.
Here's the perl script
--
MarcoVerlato - 2012-02-20