Difference: CreamProbe (5 vs. 6)

Revision 62011-11-10 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="NagiosProbes"

CREAM-CE metrics and WN probes

Line: 431 to 431
 gridjob.out wnlogs.tgz
Changed:
<
<
In gridjob.out you should found job's output; if all goes well at the end of teh file there should be some lines like these:
>
>
In gridjob.out you should found job's output; if all goes well at the end of the file there should be some lines like these:
 
  >>>>>>>>>>>>>>>>>> Wed Nov  9 18:11:35 CET 2011
T |S |c |U |O |W |C |A |P |
Changed:
<
<
3 |3 |3 |0 |3 |0 |0 |0 |0 |
>
>
3 |3 |3 |0 |3 |0 |0 |3 |0 |
 Services Total 3 Checked: 3 All services were checked. Killing Nagios.
Added:
>
>
These lines are returned by nagiostats with this meaning:

T NUMSERVICES total number of services.
S NUMSVCSCHEDULED number of services that are currently scheduled to be checked.
c NUMSVCCHECKED number of services that have been checked since start.
U NUMSVCUNKN number of services UNKNOWN.
O NUMSVCOK number of services OK.
W NUMSVCWARN number of services WARNING.
C NUMSVCCRIT number of services CRITICAL.
A NUMACTSVCCHECKS1M number of total active service checks occuring in last minute.
P NUMPSVSVCCHECKS1M number of passive host checks occuring in last minute.
 wnlogs.tgz contains also the output mail-messages from the singles worker node metrics:
Line: 529 to 541
 detailsData: Checking if BrokerInfo works\nBrokerInfo file: /home/dteam017/home_cre19_460125504/CREAM460125504/.BrokerInfo\n+ ls -l /home/dteam017/home_cre19_460125504/CREAM460125504/.BrokerInfo\n-rw-r--r-- 1 dteam017 dteam 367 Nov 9 18:11 /home/dteam017/home_cre19_460125504/CREAM460125504/.BrokerInfo\n+ set +x\nCheck if we can get the name of CE using glite-brokerinfo command\n+ glite-brokerinfo -v getCE\nBrokerInfo::getBIFileName(): /home/dteam017/home_cre19_460125504/CREAM460125504/.BrokerInfo\nBrokerInfo::getCE(): \n -> cream-19.pd.infn.it:8443/cream-lsf-creamcert2\n -> BI_SUCCESS\n+ result=0\n+ set +x\n EOT
Added:
>
>

State + Monit with notification

To test also the mechanism of messages transfer you need to install a Message Broker.

Then you can "submit" a job using this command:

/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCE-probe --vo <vo> -x <path of the proxy> -H <CREAM-ce hostname> -m emi.cream.CREAMCE-JobState --wms <WMS hostname> --mb-uri <Message Broker URI> --mb-destination <Message Broker destination>

[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCE-probe --vo dteam -x /tmp/x509up_u501 -H cream-19.pd.infn.it -m emi.cream.CREAMCE-JobState --wms cream-45.pd.infn.it --mb-uri stomp://cream-12.pd.infn.it:61613 --mb-destination /tmp/msg
OK: [Submitted]
OK: [Submitted]

Connecting to the service https://cream-45.pd.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://cream-45.pd.infn.it:9000/3sqXvSstpobzaQhTzWNH4Q

The job identifier has been saved in the following file:
/var/lib/gridprobes/dteam/emi.cream/CREAMCE/cream-19.pd.infn.it/jobID

==========================================================================

Again you can, as usual, monitor the job until it terminates:

/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCE-probe --vo <vo> -x <path of the proxy> -H <CREAM-ce hostname> -m emi.cream.CREAMCE-JobMonit --pass-check-dest active

At the end you can do the same checks as in the previous test, but also you can check the log of the Message Broker Server to see if it receives the messages, as in this example:

2011-11-10 11:17:36,856 [Thread-2] coilmq.server.socketserver.StompRequestHandler - DEBUG - Processing frame: SEND
content-length:1020
ROC:UNDEFINED
sitename:INFN-EMITESTBED
destination:/tmp/msg
persistent:true
nagios_host:localhost.localdomain
role:site

serviceURI: cream-19.pd.infn.it:8443/cream-lsf-creamcert2
hostName: localhost.localdomain
serviceFlavour: CE
siteName: INFN-EMITESTBED
metricStatus: OK
metricName: emi.wn.WN-Bi
summaryData: OK: getCE: cream-19.pd.infn.it:8443/cream-lsf-creamcert2
gatheredAt: cream-wn-007.pn.pd.infn.it
timestamp: 2011-11-10T10:17:36Z
nagiosName: emi.wn.WN-Bi-dteam
role: site
voName: dteam
serviceType: emi.wn.WN
detailsData: Checking if BrokerInfo works\nBrokerInfo file: /home/dteam017/home_cre19_378000412/CREAM378000412/.BrokerInfo\n+ ls -l /home/dteam017/home_cre19_378000412/CREAM378000412/.BrokerInfo\n-rw-r--r--  1 dteam017 dteam 2282 Nov 10 11:17 /home/dteam017/home_cre19_378000412/CREAM378000412/.BrokerInfo\n+ set +x\nCheck if we can get the name of CE using glite-brokerinfo command\n+ glite-brokerinfo -v getCE\nBrokerInfo::getBIFileName(): /home/dteam017/home_cre19_378000412/CREAM378000412/.BrokerInfo\nBrokerInfo::getCE(): \n -> cream-19.pd.infn.it:8443/cream-lsf-creamcert2\n -> BI_SUCCESS\n+ result=0\n+ set +x\n
EOT

2011-11-10 11:17:37,854 [Thread-2] coilmq.server.socketserver.StompRequestHandler - DEBUG - Processing frame: SEND
content-length:397
ROC:UNDEFINED
sitename:INFN-EMITESTBED
destination:/tmp/msg
persistent:true
nagios_host:localhost.localdomain
role:site

serviceURI: cream-19.pd.infn.it:8443/cream-lsf-creamcert2
hostName: localhost.localdomain
serviceFlavour: CE
siteName: INFN-EMITESTBED
metricStatus: OK
metricName: emi.wn.WN-Csh
summaryData: OK
gatheredAt: cream-wn-007.pn.pd.infn.it
timestamp: 2011-11-10T10:17:37Z
nagiosName: emi.wn.WN-Csh-dteam
role: site
voName: dteam
serviceType: emi.wn.WN
detailsData: Checking if CSH works\nTest: OK.\n
EOT

2011-11-10 11:17:38,854 [Thread-2] coilmq.server.socketserver.StompRequestHandler - DEBUG - Processing frame: SEND
content-length:659
ROC:UNDEFINED
sitename:INFN-EMITESTBED
destination:/tmp/msg
persistent:true
nagios_host:localhost.localdomain
role:site

serviceURI: cream-19.pd.infn.it:8443/cream-lsf-creamcert2
hostName: localhost.localdomain
serviceFlavour: CE
siteName: INFN-EMITESTBED
metricStatus: OK
metricName: emi.wn.WN-SoftVer
summaryData: OK: gLite 3.1.0
gatheredAt: cream-wn-007.pn.pd.infn.it
timestamp: 2011-11-10T10:17:38Z
nagiosName: emi.wn.WN-SoftVer-dteam
role: site
voName: dteam
serviceType: emi.wn.WN
detailsData: Installed software version\n+ type=unknow\n+ mwver=error\n+ type -f glite-version\nglite-version is /opt/glite/bin/glite-version\n+ type=gLite\n++ glite-version\n+ mwver=3.1.0\n+ set +x\nVersion pattern: ^2\\.[456789]OR^3\\.OR^1\\.\nDeducted middleware version: gLite 3.1.0\n
EOT


 
This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback