Devel09 Work Log
WMS 3.1 patch 1251
2008-01-15 (Ale)
2007-11-13 (Ale)
- Sincronize WMS with patch 1251 using this repository
2007-08-29 (Ale)
2007-08-27 (Ale)
- Shutdown due to maintenance to electrical distribution.
2007-07-23 (Ale)
- Added this cron-job to clean periodically LB databases (you need to fix first a silly bug on script $GLITE_LOCATION/sbin/glite-lb-export.sh):
[root@devel09 etc]# cat /etc/cron.d/lb-purger.cron
#! /bin/sh
GLITE_LB_EXPORT_BKSERVER="devel11.cnaf.infn.it"
# run every wednesday and sunday at 01:00
0 1 * * wed,sun glite . /etc/glite/profile.d/glite_setenv.sh ; $GLITE_LOCATION/sbin/glite-lb-export.sh >> /var/log/glite/lb_purger.log 2>&1
2007-07-02 (Ale)
- Cleaning LB databases
- Set MaxOutputSandboxSize = -1 on glite_wms.conf as workaround for bug #27215
- Reopen the WMS to test users.
2007-06-27 (Ale)
- Update rpms using patch #1203
- glite-ce-cream-client-api-c (1.7.13-0 => 1.7.14-0)
- glite-ce-monitor-client-api-c (1.7.11-0 => 1.7.12-0)
- glite-wms-common (3.1.14-1 => 3.1.17-1)
- glite-wms-helper (3.1.15-1 => 3.1.16-1)
- glite-wms-ice (3.1.15-1 => 3.1.16-1)
- glite-wms-manager (3.1.28-1 => 3.1.29-1)
- glite-wms-wmproxy (3.1.25-1 => 3.1.26-1)
- glite-lb-common (5.1.1-1 => 5.1.2-1)
- Bugs fixed:
- #27215: WM to set the maximum output sandbox size
- #27126: generation of unique filenames in jobdir is not reliable
- #27042: When ICE starts executes a lease update even if start_lease_updater is false
- #26952: WMProxy server does check mandatory attributes for collection after returning jobid to client
- #26968: There's a memory leak in a method that extract the proxy time left
2007-06-26 (Ale)
- Stop the services to debug a problem with the LBserver.
2007-06-13 (Ale)
- Update rpms using last available tags:
- glite-wms-manager_R_3_1_28_1 to fix bugs #25680 and #26857
- glite-wms-matchmaking_R_3_1_5_1 to fix bug #26913
- glite-wms-purger_R_3_1_8_1 to fix bug #26705
- glite-wms-configuration_R_3_1_7_1 to fix bug #26705
- glite-wms-ice_R_3_1_15_1 to fix bugs #27042 and #26537
- and also:
- glite-lb-client (2.3.3-1 => 2.3.4-1)
- glite-lb-client-interface (2.3.2-1 => 2.3.3-1)
- glite-lb-common (5.0.3-1 => 5.1.1-1)
- glite-lb-logger (1.4.2-1 => 1.4.3-1)
- glite-lb-proxy (1.4.1-3 => 1.4.1-4)
- glite-lb-ws-interface (2.3.0-2 => 2.3.0-3)
- glite-security-lcmaps-interface (1.3.9-1 => 1.3.14-1)
- glite-security-lcmaps-interface-without-gsi (1.3.9-1 => 1.3.14-1)
- lcg-CA (1.13-1 => 1.14-1)
- Create a new cron entry for the purge:
HOME=/
MAILTO=root@localhost
# Execute the 'purger' command at every day except on Sunday with a frequency of one hour
# if and only if the percentage of allocated blocks is greater than 40%
0 */6 * * mon-sat glite . /etc/glite/profile.d/glite_setenv.sh ; $GLITE_LOCATION/sbin/glite-wms-purgeStorage.sh -l $GLITE_LOCATION_LOG/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -t 604800 -a 40 > /dev/null
# Execute the 'purger' command at 4:00 AM, 8:00 AM, 12:00 noon, 4:00 PM,
# and 8:00 PM (0 */4) on each Sunday (sun).
0 */4 * * sun glite . /etc/glite/profile.d/glite_setenv.sh ; $GLITE_LOCATION/sbin/glite-wms-purgeStorage.sh -l $GLITE_LOCATION_LOG/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -t 604800 > /dev/null
2007-06-04 (Ale)
- Preliminary tests are successful
- Reopen the WMS to test users.
2007-06-01 (Ale)
- Stop the services to update the rpms.
- Starting from rpms on patch #1167 I update these rpms:
- glite-wms-common_R_3_1_14_1 and glite-wms-configuration_R_3_1_6_1 to fix bug #26432
- glite-wms-manager_R_3_1_27_1 to fix a compilation problem
- glite-wms-wmproxy_R_3_1_25_1 to fix bugs #26586, #26737 and #26237
- glite-wms-ism_R_3_1_14_1 to fix bug #26654
- Update also glite-wms-ice_R_3_1_13_1, glite-ce-cream-client-api-c and glite-ce-monitor-client-api-c
- create a crontab jobs to rotate wmproxy logs:
HOME=/
MAILTO=root@localhost
0 */2 * * * root /usr/sbin/logrotate -v /opt/glite/etc/wmproxy_logrotate.conf > /var/log/glite/logrotate.logs
- On glite_wms_wmproxy_httpd.conf comment these lines
#CustomLog "|/usr/sbin/rotatelogs ${GLITE_LOCATION_LOG}/httpd-wmproxy-access_%Y-%m-%d-%H.log 50M" combined
#ErrorLog "|/usr/sbin/rotatelogs ${GLITE_LOCATION_LOG}/httpd-wmproxy-errors_%Y-%m-%d-%H:%M.log 100M"
- and uncomment these ones:
CustomLog ${GLITE_LOCATION_LOG}/httpd-wmproxy-access.log combined
ErrorLog ${GLITE_LOCATION_LOG}/httpd-wmproxy-errors.log
2007-05-29 (Ale)
- Reopen the WMS to test users.
2007-05-28 (Ale)
- Stop the services to update the rpms.
- Update rpms using last available tags plus glite-wms-ice-3.1.11-1
- A new BDII which contains a CREAM-CE (prod-ce-02.pd.infn.it:8443/cream-lsf-creamusr2) is now used: egee-bdii.cnaf.infn.it
2007-05-16 (Ale)
- Problems with rotatelogs, the use of swap memory growth costantely
- Changed some parameters in the wmproxy conf file /opt/glite/etc/glite_wms_wmproxy_httpd.conf :
- LogLevel warn
- ErrorLog "|/usr/sbin/rotatelogs ${GLITE_LOCATION_LOG}/httpd-wmproxy-errors_%Y-%m-%d-%H:%M.log 100M"
- Restarted wmproxy service
2007-05-14 (Ale)
- Removed old jobs and restarted services from a cleaning situation.
2007-05-10 (Ale)
- There are some problems with WM (it stopepd many times) probably due to bugs #26213 and #26157
- Due to the many wm restarts a lot of jobs have been submitted twice, this implies that also LM stops working (see bug #26208)
2007-05-02 (Ale)
- Reopen the WMS to test users.
2007-04-24 (Ale)
- Update the last rpms so that now the wms is synchronized with patch #1140
2007-04-23 (Ale)
- Repeated the last test submitting 50 collections of 100 jobs each:
- The jobs at the end are distributed in this way (i.e. the last CE on which the job finished its run):
- 736 ( 14.7%) jobs submitted to the glite CE
- 2131 ( 42.6%) jobs submitted to the cream CE
- 2133 ( 42.7%) jobs submitted to the lcg CE
- The results are:
- 4979 (99.6%) Success
- 21 ( 0.4%) Aborted (11 on the cream-ce and 10 on the glite-ce)
2007-04-20 (Ale)
- A new test has been done:
- submitted 31 collections of 100 simples jobs;
- set the requirements so that only 3 CEs could be selected: a CREAM-CE, a GLITE-CE and an LCG-CE;
- used the fuzzyrank and abilitated the resubmissions.
- The jobs at the end are distributed in this way (i.e. the last CE on which the job finished its run):
- 846 ( 27%) jobs submitted to the glite CE
- 212 ( 7%) jobs submitted to the cream CE
- 2042 ( 66%) jobs submitted to the lcg CE
- The results are:
- 2894 (93%) Success
- 208 ( 7%) Aborted (149 on the cream-ce and 59 on the glite-ce)
- Update wmproxy rpm: glite-wms-wmproxy-3.1.20-1
2007-04-19 (Ale)
- The result of the yesterday test on submission to ice+cream is:
- Changed the vomses file (/opt/glite/etc/vomses) to add new voms server and the cms VO for proxy reneval:
"ops" "lcg-voms.cern.ch" "15009" "/C=CH/O=CERN/OU=GRID/CN=host/lcg-voms.cern.ch" "ops"
"dteam" "lcg-voms.cern.ch" "15004" "/C=CH/O=CERN/OU=GRID/CN=host/lcg-voms.cern.ch" "dteam"
"atlas" "lcg-voms.cern.ch" "15001" "/C=CH/O=CERN/OU=GRID/CN=host/lcg-voms.cern.ch" "atlas"
"cms" "lcg-voms.cern.ch" "15002" "/C=CH/O=CERN/OU=GRID/CN=host/lcg-voms.cern.ch" "cms"
"dteam" "voms101.cern.ch" "15004" "/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch" "dteam"
"atlas" "voms101.cern.ch" "15001" "/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch" "atlas"
"ops" "voms101.cern.ch" "15009" "/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch" "ops"
"cms" "voms101.cern.ch" "15002" "/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch" "cms"
- Update these rpms:
- glite-wms-classad_plugin (3.1.5-1 => 3.1.6-1)
- glite-wms-helper (3.1.14-1 => 3.1.15-1)
- glite-wms-ism (3.1.12-1 => 3.1.13-1)
- glite-wms-wmproxy (3.1.16-1 => 3.1.19-1)
- Bug fixed:
- #25653: The MatchMaking should handle 'DENY' prefix in the ACBR of CE/Views
- #25610: Error in the template.sh on the WMS 3.1 causes uberftp failure
- a problem with the dag nodes purging
2007-04-18 (Ale)
- Change the BDII: egee-bdii.cnaf.infn.it to add also CREAM CE
- Update the ice log level: ice_log_level = 700; and max_logfile_size = 100*1024*1024;
- Start testing ice+cream ..... submitting simple jobs (40 collections of 100 jobs each)
2007-04-17 (Ale)
- Do a preliminary test using only LCG-CE (Requirements = RegExp("\/blah-",other.GlueCEUniqueID)) submitting simple jobs (60 collections of 100 jobs each):
- Success: 5863 (97.7%)
- Aborted: 137 (2.3%) (Proxy expired)
2007-04-16 (Ale)
2007-04-13 (Ale)
- Stop the services to update the rpms.
- I'm removing old SandBox dir....... ...... .......
2007-04-03 (Ale)
- The problem with the IL has been understood... the fix is coming...
- Decremented the load limiter from 15 to 10 as request from Simone.
- Removed the purger cron job as request from Simone.
- Restarted all the daemons.
2007-04-02 (Ale)
- Found a new problem with the LB Interlogger, it is under investigation by the developers.
- Change the frequency at which the purger in cron is running: from every 1hour to every 6 hours
2007-03-23 (Ale)
- Stop the services to update the rpms.
- The ca_* rpms are update from 1.12-1 => 1.13-1
- Removed unnecessary rpms (glite-wms-broker, glite-wms-manager-ns-common, glite-wms-brokerinfo-access, glite-jp-server-common and glite-jp-ws-interface)
- Installed new tags:
- glite-wms_R_3_1_46_1
- glite-jdl_R_3_1_11_1
- glite-lb_R_1_3_8_1
- glite-security_R_3_1_35_1
- The bugs fixed with the first two tags are:
- #23937: WMProxy ignores DataRequirements
- #11969: minor problem when building org.glite.wms.classad_plugin
- #24954: refinement of the stochastic selector smooth function
- #25003: signal handling in WMProxy
- and the patch #1096: fixes in wms.client
- The LB tag fixed the problem with the Interlogger.
- Do a preliminary test using only LCG-CE (Requirements = RegExp("\/blah-",other.GlueCEUniqueID)) submitting 2000 simple jobs (20 collections):
- Reopen the WMS to test users.
2007-03-20
2007-03-19 (Ale)
- The LB Interlogger freezed again. Salvet and František are debugging the problem on the machine.
- LB Interlogger has been restarted.
2007-03-14 (Ale)
- WMS is now up and running correctly. The problem with LB Interlogger is again under investigation.
- Started all LB services (glite-lb-proxy, glite-lb-logd and glite-lb-interlogd) in debug mode (i.e. add the "-d" option)
2007-03-13 (Ale)
- There is a problem with LB Interlogger, it is under investigation. At the moment no submissions should be possible.
2007-03-12 (Ale)
- Fixed problem with LB rpms installing tag glite-lb_R_1_3_7_2 (see patch #1061)"
apt-get install glite-lb-server-bones
- Install patch #1075 "list match works fine" (tag glite-wms_R_3_1_40_1):
apt-get install glite-wms-helper glite-wms-matchmaking
- Restarted the services
- Do a preliminary test using only LCG-CE (Requirements = RegExp("\/blah-",other.GlueCEUniqueID)) submitting 4800 simple jobs (48 collections):
- Success: 4795 (99.9%)
- Aborted: 5 (0.1%)
- Installed the tag which solve the "rank problem": glite-wms_R_3_1_41_1
apt-get install glite-wms-common glite-wms-manager
- Restarted WorkloadManager service
2007-03-07 (Ale)
- Change apt source list: rpm http://goldrake.cnaf.infn.it:8080/ibrido/archives/glite_branch_3_1_0/repository . i386 noarch
- Update LB rpms according to patch #1061 "Pull-in recent LB 3.0 bug fixes" with tag glite-lb_R_1_3_7_1:
apt-get install glite-lb-client glite-lb-logger glite-lb-server-bones glite-lb-ws-interface glite-lb-common glite-lb-server
- Restarted the services
- Do a preliminary test using only LCG-CE (Requirements = RegExp("\/blah-",other.GlueCEUniqueID)) submitting 3630 simple jobs (38 collections):
- Success: 3620 (99.7%)
- Aborted: 10 (0.3%)
2007-03-06
- Installed normal 3.0 WMS
- Removed c-ares. Installed c-ares from the cern cert apt repository
- Inserted in apt source lists the goldrake repository: rpm http://goldrake.cnaf.infn.it:8080/ibrido/archives/glite_branch_3_1_0_continuous/repository . i386 noarch
- apt-get remove glite-WMS; apt-get remove glite-wms-manager-ns-daemon; apt-get dist-upgrade
- replaced gacl and grid-mapfile
- cp /opt/c-ares/lib/* /lib
- Installed condor 6.8.4: rpm -Uvh condor-6.8.4-linux-x86-rhel3-dynamic-1.i386.rpm
- ln -s /opt/condor-6.8.4/ condor-c (and modify all conf file condor and setenv to use this link)
- rpm -Uvh google-perftools-*
- Create file /etc/nospma to avoid SPMA downgrade
- Add following lines into /opt/condor-c/local.devel09/condor_config.local:
NEGOTIATOR_MATCHLIST_CACHING = False
GRIDMANAGER_TIMEOUT_MULTIPLIER = 3
SCHEDD_TIMEOUT_MULTIPLIER = 3
COLLECTOR_TIMEOUT_MULTIPLIER = 3
C_GAHP_TIMEOUT_MULTIPLIER = 3
C_GAHP_WORKER_THREAD_TIMEOUT_MULTIPLIER = 3
TOOL_TIMEOUT_MULTIPLIER = 3
GLITE_CONDORC_DEBUG_LEVEL = 2
GLITE_CONDORC_LOG_DIR = /var/tmp
- To apply the patch #1026: "WMProxy memory allocation doesn't increase anymore" one needs to:
- Add "export GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS=50" into /etc/glite/profile.d/glite_setenv.sh
- Add "setenv GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS 50" into /etc/glite/profile.d/glite_setenv.csh
- In /opt/glite/etc/glite_wms_wmproxy_httpd.conf, replace FastCgiConfig line with: "FastCgiConfig -restart -restart-delay 5 -idle-timeout 3600 -maxProcesses 25 -maxClassProcesses 20 -minProcesses 2 -listen-queue-depth 200 -gainValue 0.75 -killInterval 240 -updateInterval 240 -singleThreshold 15 -initial-env GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS -initial-env LD_LIBRARY_PATH -initial-env GLITE_LOCATION_VAR -initial-env GLITE_LOCATION_LOG -initial-env GLITE_LOCATION_TMP -initial-env RGMA_HOME -initial-env GLITE_SD_VO -initial-env GLITE_SD_PLUGIN -initial-env LCG_GFAL_INFOSYS -initial-env HOSTNAME -initial-env GLITE_WMS_WMPROXY_WEIGHTS_UPPER_LIMIT"
- Add at then end of the PassEnv section of file /opt/glite/etc/glite_wms_wmproxy_httpd.conf the following two lines:
PassEnv GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS
PassEnv GLITE_PR_TIMEOUT
- In the WM section of glite_wms.conf, add
CeForwardParameters = { "GlueHostMainMemoryVirtualSize", "GlueHostMainMemoryRAMSize" };
- cp /opt/glite/etc/lcmaps/lcmaps.db.template /opt/glite/etc/lcmaps/lcmaps.db
- mkdir /var/glite/ice; mkdir /var/glite/icepersist_dir; chown -R glite.glite /var/glite/ice
- in /etc/cron.d/glite-wms-wmproxy-purge-proxycache.cron, change glite_wms_wmproxy_purge_proxycache to glite-wms-wmproxy-purge-proxycache.
- Modified some glite_wms.conf parameters to sync with cern services (see attached files)
- rpm -Uvh glite-wms-manager-3.1.15-1.i386.rpm (from Goldrake repo)
- Installed new wms rpms to make it dagless:
- glite-wms-brokerinfo (3.1.3-1 => 3.2.3-1)
- glite-wms-helper (3.1.5-1 => 3.1.8-1)
- glite-wms-ism (3.1.10-1 => 3.1.11-1)
- glite-wms-matchmaking (3.1.2-1 => 3.1.3-1)
- glite-wms-wmproxy (3.1.13-1 => 3.1.14-1)
- j2re (1.4.2_13-1.cern => 1.4.2_13-2.cern)
- Add export GLITE_WMS_ENABLE_BULKMM="yes" in /etc/glite/profile.d/glite_setenv.(c)sh
- Add EnableBulkMM =true in glite_wms.conf (WM section)
- Restart all the services
--
AlessioGianelle - 13 Nov 2007