Create custom Nagios probes

There are two kinds of probes:

  1. active: executed directly from Nagios server to the monitored host;
  2. passive: "embedded" on an active probe; used to run commands on a WN using an active probe which submits a grid job to a CE.


Example of writing an active probe: eu.wenmr.lcg-tags_probe

eu.wenmr.lcg-tags_probe gets software tags of a CE.

Steps:

1) Define the probe in the file grid-monitor03:/usr/lib/perl5/vendor_perl/5.8.5/NCG/LocalMetrics/Hash.pm:

# lcg-tags probe
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{native} = "Nagios";
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{probe} ='wenmr/lcg-tags-probe';
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{metricset} = "wenmr";
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{config}->{path} = $NCG::NCG_PROBES_PATH_GRIDMON;
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{config}->{interval} = 5;
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{config}->{timeout} = 30;
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{config}->{retryInterval} = 3;
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{config}->{maxCheckAttempts} = 3;
#$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{flags}->{NOHOSTNAME} = 1;
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{flags}->{PNP} = 1;
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{flags}->{VO} = 1;
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_sprobe'}->{flags}->{OBSESS} = 1;
# $WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{docurl} = "https://twiki.cern.ch/twiki/bin/view/LCG/SAMProbesMetrics#CE";
#$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{dependency}->{"hr.srce.GRAM-Command"} = 1;
#$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{dependency}->{"hr.srce.GridProxy-Valid"} = 0;
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{attribute}->{VONAME} = "--vo";
$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{attribute}->{X509_USER_PROXY} = "-x";
#$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{attribute}->{VO_FQAN} = "--vo-fqan";
#$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{flags}->{NOLBNODE} = 1;
#$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{flags}->{NRPE} = 1;

where wenmr/lcg-tags-probe is the name of the script containing the commands to be executed to test the monitored host. See https://tomtools.cern.ch/confluence/display/SAM/NCG#NCG-Flags for details on the meaning of the flags.

2) Associate the probe to the services it will test (in this example CREAM-CE and lcg-CE); this is defined in grid-monitor03:/usr/lib/perl5/vendor_perl/5.8.5/NCG/LocalMetrics/Hash.pm:

$WLCG_NODETYPE->{vo}->{'CREAM-CE'} = [
    'org.sam.CREAMCE-JobState',
    'org.sam.CREAMCE-JobSubmit',
    ...
    'eu.wenmr.lcg-tags_probe',
    ...
];
$WLCG_NODETYPE->{vo}->{CE} = [
    'hr.srce.GRAM-Auth',
    'org.sam.CE-JobState',
    'org.sam.CE-JobSubmit',
    ...
    'eu.wenmr.lcg-tags_probe',
    ...
];

3) Write the commands to be executed to test the monitored host inside the perl script /usr/libexec/grid-monitoring/probes/wenmr/lcg-tags-probe (/usr/libexec/grid-monitoring/probes/ is the default directory for Nagios active probes, while wenmr/lcg-tags-probe is defined in Hash.pm as described above)

Here's the perl script of this example

The command executed is:

`X509_USER_PROXY=$proxy; /opt/glite/bin/lcg-tags --ce $ce --vo $vo --list`

A Nagios structure is used for input:

my $ce=$plugin->opts->get('H');
my $vo=$plugin->opts->get('vo');
my $proxy=$plugin->opts->get('x');

where:

  • $proxy is the proxy used by Nagios ($WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{attribute}->{X509_USER_PROXY} = "-x";)
  • $ce is the node to be tested (#$WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{flags}->{NOHOSTNAME} = 1;)
  • $vo is the VO used by Nagios ($WLCG_SERVICE->{'eu.wenmr.lcg-tags_probe'}->{attribute}->{VONAME} = "--vo";)

That structure is also used for the output shown in Nagios web pages:

  $plugin->nagios_exit(OK,"$msgOut");
  $plugin->nagios_exit(WARNING, "No tags found.");


Example of writing a passive probe: eu.wenmr.WN-csRosetta

eu.wenmr.WN-csRosetta runs csRosetta software into a WN

Steps:

1) Define the probe in the file grid-monitor03:/usr/lib/perl5/vendor_perl/5.8.5/NCG/LocalMetrics/Hash.pm:

$WLCG_SERVICE->{'eu.wenmr.WN-csRosetta'}->{flags}->{PASSIVE} = 1;
$WLCG_SERVICE->{'eu.wenmr.WN-csRosetta'}->{parent} = "org.sam.CE-JobState";
$WLCG_SERVICE->{'eu.wenmr.WN-csRosetta'}->{flags}->{VO} = 1;
$WLCG_SERVICE->{'eu.wenmr.WN-csRosetta'}->{flags}->{OBSESS} = 1;
$WLCG_SERVICE->{'eu.wenmr.WN-csRosetta'}->{metricset} = 'org.sam.WN';
$WLCG_SERVICE->{'eu.wenmr.WN-csRosetta'}->{docurl} = "https://twiki.cern.ch/twiki/bin/view/LCG/SAMProbesMetrics#WN";

where org.sam.CE-JobState is the active probe that submits a grid job to a CE.

2) Associate the probe to the services it will test (in this example CREAM-CE and lcg-CE); this is defined in grid-monitor03:/usr/lib/perl5/vendor_perl/5.8.5/NCG/LocalMetrics/Hash.pm:

$WLCG_NODETYPE->{vo}->{'CREAM-CE'} = [
    'org.sam.CREAMCE-JobState',
    'org.sam.CREAMCE-JobSubmit',
    ...
    'eu.wenmr.WN-csRosetta',
    ...
];
$WLCG_NODETYPE->{vo}->{CE} = [
    'hr.srce.GRAM-Auth',
    'org.sam.CE-JobState',
    'org.sam.CE-JobSubmit',
    ...
    'eu.wenmr.WN-csRosetta',
    ...
];

3) Edit grid-monitor03:/usr/lib/perl5/vendor_perl/5.8.5/NCG/LocalMetrics/Hash.pm, adding the path where the probe is located into the definition of the active probe org.sam.CE-JobState

In this example the probe is located in /usr/libexec/grid-monitoring/probes/wenmr/wnjob/

For CREAM-CE:

## CREAM CE
# org.sam.CREAMCE-JobState : [active+passive] submits grid job to CE, holds a status of the grid job
$WLCG_SERVICE->{'org.sam.CREAMCE-JobState'}->{native} = "Nagios";
...
$WLCG_SERVICE->{'org.sam.CREAMCE-JobState'}->{parameter}->{"--add-wntar-nag"} = '/usr/libexec/grid-monitoring/probes/cadist/wnjob/,/var/lib/gridprobes-cadist/,/usr/libexec/grid-monitoring/probes/wenmr/wnjob/';

For lcg-CE:

# org.sam.CE-JobState : [active+passive] submits grid job to CE, holds a status of the grid job
$WLCG_SERVICE->{'org.sam.CE-JobState'}->{native} = "Nagios";
...
$WLCG_SERVICE->{'org.sam.CE-JobState'}->{parameter}->{"--add-wntar-nag"} = '/usr/libexec/grid-monitoring/probes/cadist/wnjob/,/var/lib/gridprobes-cadist/,/usr/libexec/grid-monitoring/probes/wenmr/wnjob/';

Use a custom directory, not the Nagios default ones (ex.:/usr/libexec/grid-monitoring/probes/org.sam) in order to avoid loosing custom probes when updating org.sam package

4) Create the structure requested by Nagios (as written in https://tomtools.cern.ch/confluence/display/SAM/CE#CE-IntegrationofthirdpartyWNchecks) under the directory used in step 3.

In this example:

/usr/libexec/grid-monitoring/probes/wenmr/wnjob/
  |-- etc
  |   `-- wn.d
  |       `-- wenmr
  |           |-- commands.cfg
  |           `-- services.cfg
  `-- probes
      `-- wenmr
          `-- csRosetta

where commands.cfg and services.cfg are configuration files:

# cat etc/wn.d/wenmr/services.cfg 
...
define service{
        use                     sam-generic-wn-active
        service_description     eu.wenmr.WN-csRosetta-<VOMS>
        check_command           csRosetta
}
...

# cat etc/wn.d/wenmr/commands.cfg 
...
define command {
        command_name    csRosetta
        command_line    $USER3$/wenmr/csRosetta
}
...
[root@grid-monitor03 wnjob]# 

while probes/wenmr/csRosetta is a perl script containing the commands that will be really executed in the WN. Like for active probes, even this script requires a Nagios structure for output messages.

Here's the perl script


-- MarcoVerlato - 2012-02-20

Topic attachments
I Attachment Action Size Date Who Comment
Texttxt csRosetta.txt manage 2.5 K 2012-02-20 - 15:26 MarcoVerlato csRosetta Nagios probe perl code
Texttxt lcg-tags.probe.txt manage 2.3 K 2012-02-20 - 15:22 MarcoVerlato lcg-tags Nagios probe perl code
Topic revision: r2 - 2012-03-08 - MarcoVerlato
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback