Difference: CreamProbe (4 vs. 5)

Revision 52011-11-09 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="NagiosProbes"

CREAM-CE metrics and WN probes

Line: 344 to 344
 ======================================================================
Added:
>
>

State + Monit without notification

 
Added:
>
>
As before submit a job disabling the messages transfer (option --no-mb):
 
Added:
>
>
/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCE-probe --vo <vo> -x <path of the proxy> -H <CREAM-ce hostname> -m emi.cream.CREAMCE-JobState --wms <WMS hostname> --no-mb

Then you can monitor the job until it ends:

/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCE-probe --vo <vo> -x <path of the proxy> -H <CREAM-ce hostname> -m emi.cream.CREAMCE-JobMonit --pass-check-dest active

At the end when job finished the execution of the emi.cream.CREAMCE-JobMonit metrics should trigger also a get_output

[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCE-probe --vo dteam -x /tmp/x509up_u501 -H cream-19.pd.infn.it -m emi.cream.CREAMCE-JobMonit   --pass-check-dest active
metric results >>> <cream-19.pd.infn.it,emi.cream.CREAMCE-JobSubmit-dteam>
OK: success.
OK: success.

Testing from: cream-48.pd.infn.it
DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy
VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL
glite-wms-job-status https://cream-45.pd.infn.it:9000/qAAznOmFalzWi5eTYrjAlQ


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://cream-45.pd.infn.it:9000/qAAznOmFalzWi5eTYrjAlQ
Current Status:     Done (Success)
Exit code:          0
Status Reason:      Job Terminated Successfully
Destination:        cream-19.pd.infn.it:8443/cream-lsf-creamcert2
Submitted:          Wed Nov  9 18:11:03 2011 CET
==========================================================================
Getting job output: OK.

metric results >>> <cream-19.pd.infn.it,emi.cream.CREAMCE-JobState-dteam>
OK: success.
OK: success.

Testing from: cream-48.pd.infn.it
DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy
VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL
glite-wms-job-status https://cream-45.pd.infn.it:9000/qAAznOmFalzWi5eTYrjAlQ


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://cream-45.pd.infn.it:9000/qAAznOmFalzWi5eTYrjAlQ
Current Status:     Done (Success)
Exit code:          0
Status Reason:      Job Terminated Successfully
Destination:        cream-19.pd.infn.it:8443/cream-lsf-creamcert2
Submitted:          Wed Nov  9 18:11:03 2011 CET
==========================================================================
Getting job output: OK.

OK: Jobs processed - 1
OK: Jobs processed - 1
Done : 1|jobs_processed=1;; DONE=1;; RUNNING=0;; SCHEDULED=0;; SUBMITTED=0;; READY=0;; WAITING=0;; WAITING-CANCELLED=0;; WAITING-CANCEL=0;; ABORTED=0;; CANCELLED=0;; CLEARED=0;; MISSED=0;; UNDETERMINED=0;; unknown=0;1;2

You can verify that it works correctly checking the status of the job.

[ale@cream-48 ~]$ glite-wms-job-status https://cream-45.pd.infn.it:9000/qAAznOmFalzWi5eTYrjAlQ


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://cream-45.pd.infn.it:9000/qAAznOmFalzWi5eTYrjAlQ
Current Status:     Cleared 
Status Reason:      user retrieved output sandbox
Destination:        cream-19.pd.infn.it:8443/cream-lsf-creamcert2
Submitted:          Wed Nov  9 18:11:03 2011 CET
==========================================================================

To check that metrics on the worker node run correctly you can edit the output files in /var/lib/gridprobes/<VO or FQAN>/emi.cream/CREAMCE/<hostname>/jobOutput

[ale@cream-48 ~]$ ls /var/lib/gridprobes/dteam/emi.cream/CREAMCE/cream-19.pd.infn.it/jobOutput/ale_qAAznOmFalzWi5eTYrjAlQ/
gridjob.out wnlogs.tgz

In gridjob.out you should found job's output; if all goes well at the end of teh file there should be some lines like these:

 
Added:
>
>
  >>>>>>>>>>>>>>>>>> Wed Nov  9 18:11:35 CET 2011
T |S |c |U |O |W |C |A |P |
3 |3 |3 |0 |3 |0 |0 |0 |0 |
Services Total 3 Checked: 3
All services were checked. Killing Nagios.

wnlogs.tgz contains also the output mail-messages from the singles worker node metrics:

[ale@cream-48 ~]$ tar ztvf /var/lib/gridprobes/dteam/emi.cream/CREAMCE/cream-19.pd.infn.it/jobOutput/ale_qAAznOmFalzWi5eTYrjAlQ/wnlogs.tgz 
drwxr-xr-x dteam017/dteam    0 2011-11-09 18:11:35 home/dteam017/home_cre19_460125504/CREAM460125504/nagios/var/
drwxr-xr-x dteam017/dteam    0 2011-11-09 18:11:02 home/dteam017/home_cre19_460125504/CREAM460125504/nagios/var/archives/
-rw-r--r-- dteam017/dteam 5804 2011-11-09 18:11:26 home/dteam017/home_cre19_460125504/CREAM460125504/nagios/var/objects.cache
-rw-r--r-- dteam017/dteam 69232 2011-11-09 18:11:35 home/dteam017/home_cre19_460125504/CREAM460125504/nagios/var/nagios.debug
-rw-r--r-- dteam017/dteam   920 2011-11-09 18:11:35 home/dteam017/home_cre19_460125504/CREAM460125504/nagios/var/nagios.log
drwxr-xr-x dteam017/dteam     0 2011-11-09 18:11:02 home/dteam017/home_cre19_460125504/CREAM460125504/nagios/var/rw/
drwxr-xr-x dteam017/dteam     0 2011-11-09 18:11:02 home/dteam017/home_cre19_460125504/CREAM460125504/nagios/var/spool/
drwxr-xr-x dteam017/dteam     0 2011-11-09 18:11:32 home/dteam017/home_cre19_460125504/CREAM460125504/nagios/var/spool/checkresults/
drwxr-xr-x dteam017/dteam     0 2011-11-09 18:11:02 home/dteam017/home_cre19_460125504/CREAM460125504/nagios/tmp/
drwxr-xr-x dteam017/dteam     0 2011-11-09 18:11:30 tmp/sam.26154.23590/msg-outgoing/
drwxrwxr-x dteam017/dteam     0 2011-11-09 18:11:32 tmp/sam.26154.23590/msg-outgoing/temporary/
drwxrwxr-x dteam017/dteam     0 2011-11-09 18:11:32 tmp/sam.26154.23590/msg-outgoing/00000000/
drwxrwxr-x dteam017/dteam     0 2011-11-09 18:11:30 tmp/sam.26154.23590/msg-outgoing/00000000/4ebab442089af6/
-rw-rw-r-- dteam017/dteam  1019 2011-11-09 18:11:30 tmp/sam.26154.23590/msg-outgoing/00000000/4ebab442089af6/text_body
-rw-rw-r-- dteam017/dteam    83 2011-11-09 18:11:30 tmp/sam.26154.23590/msg-outgoing/00000000/4ebab442089af6/header
drwxrwxr-x dteam017/dteam     0 2011-11-09 18:11:32 tmp/sam.26154.23590/msg-outgoing/00000000/4ebab44404d0fc/
-rw-rw-r-- dteam017/dteam   659 2011-11-09 18:11:32 tmp/sam.26154.23590/msg-outgoing/00000000/4ebab44404d0fc/text_body
-rw-rw-r-- dteam017/dteam    83 2011-11-09 18:11:32 tmp/sam.26154.23590/msg-outgoing/00000000/4ebab44404d0fc/header
drwxrwxr-x dteam017/dteam     0 2011-11-09 18:11:31 tmp/sam.26154.23590/msg-outgoing/00000000/4ebab44306bbab/
-rw-rw-r-- dteam017/dteam   397 2011-11-09 18:11:31 tmp/sam.26154.23590/msg-outgoing/00000000/4ebab44306bbab/text_body
-rw-rw-r-- dteam017/dteam    83 2011-11-09 18:11:31 tmp/sam.26154.23590/msg-outgoing/00000000/4ebab44306bbab/header
drwxrwxr-x dteam017/dteam     0 2011-11-09 18:11:30 tmp/sam.26154.23590/msg-outgoing/obsolete/
 
Added:
>
>
Looking inside them you should found the output of the emi.wn.WN-Csh metrics:
 
Added:
>
>
[ale@cream-48 tmp]$ cat tmp/sam.26154.23590/msg-outgoing/00000000/4ebab44306bbab/text_body
serviceURI: cream-19.pd.infn.it:8443/cream-lsf-creamcert2
hostName: localhost.localdomain
serviceFlavour: CE
siteName: INFN-EMITESTBED
metricStatus: OK
metricName: emi.wn.WN-Csh
summaryData: OK
gatheredAt: cream-wn-007.pn.pd.infn.it
timestamp: 2011-11-09T17:11:31Z
nagiosName: emi.wn.WN-Csh-dteam
role: site
voName: dteam
serviceType: emi.wn.WN
detailsData: Checking if CSH works\nTest: OK.\n
EOT
 
Added:
>
>
Of the emi.wn.WN-SoftVer metrics:

serviceURI: cream-19.pd.infn.it:8443/cream-lsf-creamcert2
hostName: localhost.localdomain
serviceFlavour: CE
siteName: INFN-EMITESTBED
metricStatus: OK
metricName: emi.wn.WN-SoftVer
summaryData: OK: gLite 3.1.0
gatheredAt: cream-wn-007.pn.pd.infn.it
timestamp: 2011-11-09T17:11:32Z
nagiosName: emi.wn.WN-SoftVer-dteam
role: site
voName: dteam
serviceType: emi.wn.WN
detailsData: Installed software version\n+ type=unknow\n+ mwver=error\n+ type -f glite-version\nglite-version is /opt/glite/bin/glite-version\n+ type=gLite\n++ glite-version\n+ mwver=3.1.0\n+ set +x\nVersion pattern: ^2\\.[456789]OR^3\\.OR^1\\.\nDeducted middleware version: gLite 3.1.0\n
EOT

Of the emi.wn.WN-Bi metrics:

serviceURI: cream-19.pd.infn.it:8443/cream-lsf-creamcert2
hostName: localhost.localdomain
serviceFlavour: CE
siteName: INFN-EMITESTBED
metricStatus: OK
metricName: emi.wn.WN-Bi
summaryData: OK: getCE: cream-19.pd.infn.it:8443/cream-lsf-creamcert2
gatheredAt: cream-wn-007.pn.pd.infn.it
timestamp: 2011-11-09T17:11:30Z
nagiosName: emi.wn.WN-Bi-dteam
role: site
voName: dteam
serviceType: emi.wn.WN
detailsData: Checking if BrokerInfo works\nBrokerInfo file: /home/dteam017/home_cre19_460125504/CREAM460125504/.BrokerInfo\n+ ls -l /home/dteam017/home_cre19_460125504/CREAM460125504/.BrokerInfo\n-rw-r--r--  1 dteam017 dteam 367 Nov  9 18:11 /home/dteam017/home_cre19_460125504/CREAM460125504/.BrokerInfo\n+ set +x\nCheck if we can get the name of CE using glite-brokerinfo command\n+ glite-brokerinfo -v getCE\nBrokerInfo::getBIFileName(): /home/dteam017/home_cre19_460125504/CREAM460125504/.BrokerInfo\nBrokerInfo::getCE(): \n -> cream-19.pd.infn.it:8443/cream-lsf-creamcert2\n -> BI_SUCCESS\n+ result=0\n+ set +x\n
EOT
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback