Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
CREAM-CE metrics and WN probes |
Line: 1 to 1 | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CREAM-CE metrics and WN probes | ||||||||||||||||||||||||||||
Line: 431 to 431 | ||||||||||||||||||||||||||||
gridjob.out wnlogs.tgz | ||||||||||||||||||||||||||||
Changed: | ||||||||||||||||||||||||||||
< < | In gridjob.out you should found job's output; if all goes well at the end of teh file there should be some lines like these: | |||||||||||||||||||||||||||
> > | In gridjob.out you should found job's output; if all goes well at the end of the file there should be some lines like these: | |||||||||||||||||||||||||||
>>>>>>>>>>>>>>>>>> Wed Nov 9 18:11:35 CET 2011 T |S |c |U |O |W |C |A |P | | ||||||||||||||||||||||||||||
Changed: | ||||||||||||||||||||||||||||
< < | 3 |3 |3 |0 |3 |0 |0 |0 |0 | | |||||||||||||||||||||||||||
> > | 3 |3 |3 |0 |3 |0 |0 |3 |0 | | |||||||||||||||||||||||||||
Services Total 3 Checked: 3 All services were checked. Killing Nagios. | ||||||||||||||||||||||||||||
Added: | ||||||||||||||||||||||||||||
> > | These lines are returned by nagiostats with this meaning:
| |||||||||||||||||||||||||||
wnlogs.tgz contains also the output mail-messages from the singles worker node metrics: | ||||||||||||||||||||||||||||
Line: 529 to 541 | ||||||||||||||||||||||||||||
detailsData: Checking if BrokerInfo works\nBrokerInfo file: /home/dteam017/home_cre19_460125504/CREAM460125504/.BrokerInfo\n+ ls -l /home/dteam017/home_cre19_460125504/CREAM460125504/.BrokerInfo\n-rw-r--r-- 1 dteam017 dteam 367 Nov 9 18:11 /home/dteam017/home_cre19_460125504/CREAM460125504/.BrokerInfo\n+ set +x\nCheck if we can get the name of CE using glite-brokerinfo command\n+ glite-brokerinfo -v getCE\nBrokerInfo::getBIFileName(): /home/dteam017/home_cre19_460125504/CREAM460125504/.BrokerInfo\nBrokerInfo::getCE(): \n -> cream-19.pd.infn.it:8443/cream-lsf-creamcert2\n -> BI_SUCCESS\n+ result=0\n+ set +x\n EOT | ||||||||||||||||||||||||||||
Added: | ||||||||||||||||||||||||||||
> > |
State + Monit with notificationTo test also the mechanism of messages transfer you need to install a Message Broker. Then you can "submit" a job using this command: /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCE-probe --vo <vo> -x <path of the proxy> -H <CREAM-ce hostname> -m emi.cream.CREAMCE-JobState --wms <WMS hostname> --mb-uri <Message Broker URI> --mb-destination <Message Broker destination>[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCE-probe --vo dteam -x /tmp/x509up_u501 -H cream-19.pd.infn.it -m emi.cream.CREAMCE-JobState --wms cream-45.pd.infn.it --mb-uri stomp://cream-12.pd.infn.it:61613 --mb-destination /tmp/msg OK: [Submitted] OK: [Submitted] Connecting to the service https://cream-45.pd.infn.it:7443/glite_wms_wmproxy_server ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: https://cream-45.pd.infn.it:9000/3sqXvSstpobzaQhTzWNH4Q The job identifier has been saved in the following file: /var/lib/gridprobes/dteam/emi.cream/CREAMCE/cream-19.pd.infn.it/jobID ==========================================================================Again you can, as usual, monitor the job until it terminates: /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCE-probe --vo <vo> -x <path of the proxy> -H <CREAM-ce hostname> -m emi.cream.CREAMCE-JobMonit --pass-check-dest active At the end you can do the same checks as in the previous test, but also you can check the log of the Message Broker Server to see if it receives the messages, as in this example: 2011-11-10 11:17:36,856 [Thread-2] coilmq.server.socketserver.StompRequestHandler - DEBUG - Processing frame: SEND content-length:1020 ROC:UNDEFINED sitename:INFN-EMITESTBED destination:/tmp/msg persistent:true nagios_host:localhost.localdomain role:site serviceURI: cream-19.pd.infn.it:8443/cream-lsf-creamcert2 hostName: localhost.localdomain serviceFlavour: CE siteName: INFN-EMITESTBED metricStatus: OK metricName: emi.wn.WN-Bi summaryData: OK: getCE: cream-19.pd.infn.it:8443/cream-lsf-creamcert2 gatheredAt: cream-wn-007.pn.pd.infn.it timestamp: 2011-11-10T10:17:36Z nagiosName: emi.wn.WN-Bi-dteam role: site voName: dteam serviceType: emi.wn.WN detailsData: Checking if BrokerInfo works\nBrokerInfo file: /home/dteam017/home_cre19_378000412/CREAM378000412/.BrokerInfo\n+ ls -l /home/dteam017/home_cre19_378000412/CREAM378000412/.BrokerInfo\n-rw-r--r-- 1 dteam017 dteam 2282 Nov 10 11:17 /home/dteam017/home_cre19_378000412/CREAM378000412/.BrokerInfo\n+ set +x\nCheck if we can get the name of CE using glite-brokerinfo command\n+ glite-brokerinfo -v getCE\nBrokerInfo::getBIFileName(): /home/dteam017/home_cre19_378000412/CREAM378000412/.BrokerInfo\nBrokerInfo::getCE(): \n -> cream-19.pd.infn.it:8443/cream-lsf-creamcert2\n -> BI_SUCCESS\n+ result=0\n+ set +x\n EOT 2011-11-10 11:17:37,854 [Thread-2] coilmq.server.socketserver.StompRequestHandler - DEBUG - Processing frame: SEND content-length:397 ROC:UNDEFINED sitename:INFN-EMITESTBED destination:/tmp/msg persistent:true nagios_host:localhost.localdomain role:site serviceURI: cream-19.pd.infn.it:8443/cream-lsf-creamcert2 hostName: localhost.localdomain serviceFlavour: CE siteName: INFN-EMITESTBED metricStatus: OK metricName: emi.wn.WN-Csh summaryData: OK gatheredAt: cream-wn-007.pn.pd.infn.it timestamp: 2011-11-10T10:17:37Z nagiosName: emi.wn.WN-Csh-dteam role: site voName: dteam serviceType: emi.wn.WN detailsData: Checking if CSH works\nTest: OK.\n EOT 2011-11-10 11:17:38,854 [Thread-2] coilmq.server.socketserver.StompRequestHandler - DEBUG - Processing frame: SEND content-length:659 ROC:UNDEFINED sitename:INFN-EMITESTBED destination:/tmp/msg persistent:true nagios_host:localhost.localdomain role:site serviceURI: cream-19.pd.infn.it:8443/cream-lsf-creamcert2 hostName: localhost.localdomain serviceFlavour: CE siteName: INFN-EMITESTBED metricStatus: OK metricName: emi.wn.WN-SoftVer summaryData: OK: gLite 3.1.0 gatheredAt: cream-wn-007.pn.pd.infn.it timestamp: 2011-11-10T10:17:38Z nagiosName: emi.wn.WN-SoftVer-dteam role: site voName: dteam serviceType: emi.wn.WN detailsData: Installed software version\n+ type=unknow\n+ mwver=error\n+ type -f glite-version\nglite-version is /opt/glite/bin/glite-version\n+ type=gLite\n++ glite-version\n+ mwver=3.1.0\n+ set +x\nVersion pattern: ^2\\.[456789]OR^3\\.OR^1\\.\nDeducted middleware version: gLite 3.1.0\n EOT |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
CREAM-CE metrics and WN probes | ||||||||
Line: 10 to 10 | ||||||||
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
emi.wms.CREAMCE-JobState | ||||||||
Line: 250 to 250 | ||||||||
serviceType: emi.wn.WN detailsData: Checking if BrokerInfo works\nBrokerInfo file: /home/dteam009/home_cre30_262167654/CREAM262167654/.BrokerInfo\n+ ls -l /home/dteam009/home_cre30_262167654/CREAM262167654/.BrokerInfo\n-rw-r--r-- 1 dteam009 dteam 367 Sep 22 17:25 /home/dteam009/home_cre30_262167654/CREAM262167654/.BrokerInfo\n+ set +x\nCheck if we can get the name of CE using glite-brokerinfo command\n+ glite-brokerinfo -v getCE\nBrokerInfo::getBIFileName(): /home/dteam009/home_cre30_262167654/CREAM262167654/.BrokerInfo\nBrokerInfo::getCE(): \n -> cream-30.pd.infn.it:8443/cream-pbs-creamtest2\n -> BI_SUCCESS\n+ result=0\n+ set +x\n | ||||||||
Added: | ||||||||
> > |
TestTo test the probe you have to create a valid proxy.State + Monit + CancelFirst you have to "submit" a jdl: /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCE-probe --vo <vo> -x <path of the proxy> -H <CREAM-ce hostname> -m emi.cream.CREAMCE-JobState --wms <WMS hostname> --no-mb[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCE-probe --vo dteam -x /tmp/x509up_u501 -H cream-19.pd.infn.it -m emi.cream.CREAMCE-JobState --wms cream-45.pd.infn.it --no-mb OK: [Submitted] OK: [Submitted] Connecting to the service https://cream-45.pd.infn.it:7443/glite_wms_wmproxy_server ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: https://cream-45.pd.infn.it:9000/fvNWcPJ6nVXAQJqhuU29-g The job identifier has been saved in the following file: /var/lib/gridprobes/dteam/emi.cream/CREAMCE/cream-19.pd.infn.it/jobID ==========================================================================You can "monitor" the job: /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCE-probe --vo <vo> -x <path of the proxy> -H <CREAM-ce hostname> -m emi.cream.CREAMCE-JobMonit --pass-check-dest active [ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCE-probe --vo dteam -x /tmp/x509up_u501 -H cream-19.pd.infn.it -m emi.cream.CREAMCE-JobMonit --pass-check-dest active metric results >>> <cream-19.pd.infn.it,emi.cream.CREAMCE-JobState-dteam> OK: [Scheduled] https://cream-45.pd.infn.it:9000/fvNWcPJ6nVXAQJqhuU29-g OK: [Scheduled] https://cream-45.pd.infn.it:9000/fvNWcPJ6nVXAQJqhuU29-g Testing from: cream-48.pd.infn.it DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL glite-wms-job-status https://cream-45.pd.infn.it:9000/fvNWcPJ6nVXAQJqhuU29-g ======================= glite-wms-job-status Success ===================== BOOKKEEPING INFORMATION: Status info for the Job : https://cream-45.pd.infn.it:9000/fvNWcPJ6nVXAQJqhuU29-g Current Status: Scheduled Status Reason: unavailable Destination: cream-19.pd.infn.it:8443/cream-lsf-creamcert2 Submitted: Wed Nov 9 16:43:31 2011 CET ========================================================================== OK: Jobs processed - 1 OK: Jobs processed - 1 [Scheduled] : 1|jobs_processed=1;; DONE=0;; RUNNING=0;; SCHEDULED=1;; SUBMITTED=0;; READY=0;; WAITING=0;; WAITING-CANCELLED=0;; WAITING-CANCEL=0;; ABORTED=0;; CANCELLED=0;; CLEARED=0;; MISSED=0;; UNDETERMINED=0;; unknown=0;1;2Before it finishes you can "cancel" it: /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCE-probe --vo <vo> -x <path of the proxy> -H <CREAM-ce hostname> -m emi.cream.CREAMCE-JobCancel [ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCE-probe --vo dteam -x /tmp/x509up_u501 -H cream-19.pd.infn.it -m emi.cream.CREAMCE-JobCancel OK: job cancelled OK: job cancelled Testing from: cream-48.pd.infn.it DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL Job cancellation request sent: glite-wms-job-cancel --noint -i /var/lib/gridprobes/dteam/emi.cream/CREAMCE/cream-19.pd.infn.it/jobID Job bookkeeping files deleted.You can verify that it works correctly checking the status of the job. [ale@cream-48 ~]$ glite-wms-job-status https://cream-45.pd.infn.it:9000/fvNWcPJ6nVXAQJqhuU29-g ======================= glite-wms-job-status Success ===================== BOOKKEEPING INFORMATION: Status info for the Job : https://cream-45.pd.infn.it:9000/fvNWcPJ6nVXAQJqhuU29-g Current Status: Cancelled Destination: cream-19.pd.infn.it:8443/cream-lsf-creamcert2 Submitted: Wed Nov 9 16:43:31 2011 CET ========================================================================== |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
CREAM-CE metrics and WN probes | ||||||||
Line: 38 to 38 | ||||||||
ShallowRetryCount = | ||||||||
Added: | ||||||||
> > | The message transfer agent (MTA) | |||||||
To manage checks on the worker node it is used the executable nagrun.sh (see below). The arguments are dinamically composed by the metrics translating the ones given to the probes. In particular results from the worker nodes are sent via a message transfer agent (MTA) to Message Brokers. The code for MTA mta-simple is located under <WN_codebase>/bin/ and implementation in <WN_codebas>e/lib/python2.3/site-packages/mig.
The MTA: | ||||||||
Line: 55 to 57 | ||||||||
Note that the last option --no-mb disabilitates messages transfer; in that case results e-mail messages can be found in the job's output file wnlogs.tgz . | ||||||||
Added: | ||||||||
> > | Output filesDefault JDL's output sandbox defines two files that will be taken from WNOutputSandbox = {"gridjob.out","wnlogs.tgz"};
Third-party WN checks | |||||||
To describe which checks must be execute on the worker node the following parameters should be used:
|
Line: 1 to 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||||
< < | CREAM-CE metricsThese metrics are used to test worker nodes submitting ad-hoc jdls which run some grid's checks. | ||||||||||||||||||||||||||||||||||||||||||||||||||
> > | CREAM-CE metrics and WN probes | ||||||||||||||||||||||||||||||||||||||||||||||||||
Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||||
< < |
| ||||||||||||||||||||||||||||||||||||||||||||||||||
> > | CREAM-CE metricsThese metrics are used to test worker nodes submitting ad-hoc jdls which run some grid's checks.
| ||||||||||||||||||||||||||||||||||||||||||||||||||
Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||||
< < | emi.wms.CREAMCE-JobState | ||||||||||||||||||||||||||||||||||||||||||||||||||
> > | emi.wms.CREAMCE-JobState | ||||||||||||||||||||||||||||||||||||||||||||||||||
Submit a grid job to a given CREAM-CE through a WMS. These are the generic parameters: | |||||||||||||||||||||||||||||||||||||||||||||||||||
Line: 35 to 38 | |||||||||||||||||||||||||||||||||||||||||||||||||||
ShallowRetryCount = | |||||||||||||||||||||||||||||||||||||||||||||||||||
Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||||
< < | To manage checks on the worker node it is used the executable nagrun.sh (see below). The arguments are dinamically composed by the metrics translating the ones given to the probes. | ||||||||||||||||||||||||||||||||||||||||||||||||||
> > | To manage checks on the worker node it is used the executable nagrun.sh (see below). The arguments are dinamically composed by the metrics translating the ones given to the probes. In particular results from the worker nodes are sent via a message transfer agent (MTA) to Message Brokers. The code for MTA mta-simple is located under <WN_codebase>/bin/ and implementation in <WN_codebas>e/lib/python2.3/site-packages/mig. | ||||||||||||||||||||||||||||||||||||||||||||||||||
Added: | |||||||||||||||||||||||||||||||||||||||||||||||||||
> > | The MTA: | ||||||||||||||||||||||||||||||||||||||||||||||||||
Added: | |||||||||||||||||||||||||||||||||||||||||||||||||||
> > |
handle_service_check invoked by Nagios after execution of each check. The parameters to manage resource broker are: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Line: 46 to 53 | |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||||
< < | --timeout-wnjob-global <sec> Global timeout for a job on WN. (Default: 600) --add-wntar-nag <d1,d2,..> Comma-separated list of top level directories with Nagios compliant directories structure to be added to tarball to be sent to WN. --add-wntar-nag-nosam Instructs the metric not to include standard SAM WN probes and their Nagios config to WN tarball. (Default: WN probes are included) --add-wntar-nag-nosamcfg Instructs the metric not to include Nagios configuration for SAM WN probes to WN tarball. The probes themselves and respective Python packages, however, will be included. --wnjob-location <dir> Full path to directory contaning WN scheduler. (Default: <emi.cream.ProbesLocation>/wnjob) --wn-verb <0-3> Metrics verbosity level on WN. [-v <VERBOSITY>] (Default: 0) --wn-verb-fw <0-3> Framework verbosity level on WN (Default: 1) | ||||||||||||||||||||||||||||||||||||||||||||||||||
> > | Note that the last option --no-mb disabilitates messages transfer; in that case results e-mail messages can be found in the job's output file wnlogs.tgz .
To describe which checks must be execute on the worker node the following parameters should be used: | ||||||||||||||||||||||||||||||||||||||||||||||||||
Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||||
< < | |||||||||||||||||||||||||||||||||||||||||||||||||||
> > |
|-- etc | `-- wn.d | `-- org.my | |-- commands.cfg | `-- services.cfg `-- probes `-- org.my |-- check_A |-- check_B `-- checks_lib.sh
define command{ command_name check_A2 command_line $USER3$/org.my/check_A -w <wnjobWorkDir>/.mygridprobes }For this particular part of Nagios objects configuration and macros please see the Nagios documentation for resources configuration. With these last parameters you can manage some timeouts in WNs:
nagrun.shOn WN (as specified in JDL with Executable = "<jdlExecutable>") nagrun.sh script (on Nagios UI located in <WN_codebase>) is used to
usage: nagrun.sh -v <vo> -d <dest> [-b <broker_uri>] [-n <broker_network>] [-t <timeout>] [-w <fw_verb>] [-z <metric_verb>] [-f <fqan>] [-i <host:port,..>] -B -R -N -h -m -v and -d (if not -m) are mandatory paramters. Defaults: <broker_network> - PROD <timeout> - 600 sec <metrics_verb> - 0 <fw_verb> - 1 (2 - messages, 3 - Nagios config/stats/debug) -f <fqan> - VOMS FQAN -B - don't do broker discovery -R - take MB randomly; by default sort by min response time -N - don't run WN tests -m - don't use mta service to transfer messagesIn most cases the parameters is the translation of corresponding ones given to emi.cream.CREAMCE-JobState metric.
emi.wms.CREAMCE-JobMonitMonitors status of all submitted jobs (as defined inactivejob.map files) and updates states of emi.cream.CREAMCE-JobState and emi.wms.CREAMCE-JobMonit metrics. Acts as a babysitter for all grid jobs submitted by emi.cream.CREAMCE-JobState. emi.cream.CREAMCE-JobState and emi.cream.CREAMCE-JobMonit are updated (as passive checks) either via Nagios command file or NSCA. It accepts these parameters:
--timeout-job-global <sec> Global timeout for jobs. Job will be cancelled and dropped if it is not in terminal state by that time. (Default: 3300) --timeout-job-waiting <sec> Time allowed for a job to stay in Waiting with 'no compatible resources'. (Default: 2700) --timeout-job-discard <sec> Discard job after the timeout. (Default: 21600) --timeout-job-schedrun <sec> Scheduled/Running states timeout. (Default: 19800) --hosts <h1,h2,..> Comma-separated list of CE hostnames to run the monitor on. WN ProbesUsing the default wntag directoryemi.wn these probes are performed on the worker node using the wrapper samtest-run:
WN-cshChecking if CSH works running the command:/bin/csh -c "env|sort" > env-csh.txt and then cheking if the variable PATH is defined. Accept only the parameter: debug .
Example of a message sent as output:
serviceURI: cream-30.pd.infn.it:8443/cream-pbs-creamtest2 hostName: localhost.localdomain serviceFlavour: CE siteName: FAKE-SITE metricStatus: OK metricName: emi.wn.WN-Csh summaryData: OK gatheredAt: cream-wn-030.pn.pd.infn.it timestamp: 2011-09-22T15:25:36Z nagiosName: emi.wn.WN-Csh-dteam role: site voName: dteam serviceType: emi.wn.WN detailsData: Checking if CSH works\nTest: OK.\n WN-softverDetects the version of software which is really installed on the WN. To detect the versionlcg-version , glite-version commands and the cat of the /etc/emi-version file are tried and if the commands are not available the script exits with an error.
Example of a message sent as output:
serviceURI: cream-30.pd.infn.it:8443/cream-pbs-creamtest2 hostName: localhost.localdomain serviceFlavour: CE siteName: FAKE-SITE metricStatus: OK metricName: emi.wn.WN-SoftVer summaryData: OK: EMI 1.2.0-1 gatheredAt: cream-wn-030.pn.pd.infn.it timestamp: 2011-09-22T15:25:37Z nagiosName: emi.wn.WN-SoftVer-dteam role: site voName: dteam serviceType: emi.wn.WN detailsData: Installed software version\n+ type=unknow\n+ mwver=error\n+ type -f glite-version\n/home/dteam009/home_cre30_262167654/CREAM262167654/nagios/probes/emi.wn/sam/WN-softver: line 31: type: glite-version: not found\n+ '[' -f /etc/emi-version ']'\n+ type=EMI\n++ cat /etc/emi-version\n+ mwver=1.2.0-1\n+ set +x\nVersion pattern: ^2\\.[456789]OR^3\\.OR^1\\.\nDeducted middleware version: EMI 1.2.0-1\n WN-brokerinfoCheck if BrokerInfo works. The procedure is the following:
serviceURI: cream-30.pd.infn.it:8443/cream-pbs-creamtest2 hostName: localhost.localdomain serviceFlavour: CE siteName: FAKE-SITE metricStatus: OK metricName: emi.wn.WN-Bi summaryData: OK: getCE: cream-30.pd.infn.it:8443/cream-pbs-creamtest2 gatheredAt: cream-wn-030.pn.pd.infn.it timestamp: 2011-09-22T15:25:35Z nagiosName: emi.wn.WN-Bi-dteam role: site voName: dteam serviceType: emi.wn.WN detailsData: Checking if BrokerInfo works\nBrokerInfo file: /home/dteam009/home_cre30_262167654/CREAM262167654/.BrokerInfo\n+ ls -l /home/dteam009/home_cre30_262167654/CREAM262167654/.BrokerInfo\n-rw-r--r-- 1 dteam009 dteam 367 Sep 22 17:25 /home/dteam009/home_cre30_262167654/CREAM262167654/.BrokerInfo\n+ set +x\nCheck if we can get the name of CE using glite-brokerinfo command\n+ glite-brokerinfo -v getCE\nBrokerInfo::getBIFileName(): /home/dteam009/home_cre30_262167654/CREAM262167654/.BrokerInfo\nBrokerInfo::getCE(): \n -> cream-30.pd.infn.it:8443/cream-pbs-creamtest2\n -> BI_SUCCESS\n+ result=0\n+ set +x\n |
Line: 1 to 1 | |||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Added: | |||||||||||||||||||||||||||
> > |
CREAM-CE metricsThese metrics are used to test worker nodes submitting ad-hoc jdls which run some grid's checks.
emi.wms.CREAMCE-JobStateSubmit a grid job to a given CREAM-CE through a WMS. These are the generic parameters:
Type="Job"; JobType="Normal"; Executable = "<jdlExecutable>"; StdError = "gridjob.out"; StdOutput = "gridjob.out"; Arguments = "<jdlArguments>"; InputSandbox = {"<jdlInputSandboxExecutable>", "<jdlInputSandboxTarball>"}; OutputSandbox = {"gridjob.out","wnlogs.tgz"}; RetryCount = <jdlRetryCount>; ShallowRetryCount = <jdlShallowRetryCount>; Requirements = other.GlueCEInfoHostName == "<jdlReqCEInfoHostName>";To manage checks on the worker node it is used the executable nagrun.sh (see below). The arguments are dinamically composed by the metrics translating the ones given to the probes.
--timeout-wnjob-global <sec> Global timeout for a job on WN. (Default: 600) --add-wntar-nag <d1,d2,..> Comma-separated list of top level directories with Nagios compliant directories structure to be added to tarball to be sent to WN. --add-wntar-nag-nosam Instructs the metric not to include standard SAM WN probes and their Nagios config to WN tarball. (Default: WN probes are included) --add-wntar-nag-nosamcfg Instructs the metric not to include Nagios configuration for SAM WN probes to WN tarball. The probes themselves and respective Python packages, however, will be included. --wnjob-location <dir> Full path to directory contaning WN scheduler. (Default: <emi.cream.ProbesLocation>/wnjob) --wn-verb <0-3> Metrics verbosity level on WN. [-v <VERBOSITY>] (Default: 0) --wn-verb-fw <0-3> Framework verbosity level on WN (Default: 1) |