Tags:
, view all tags

Devel09 Work Log

WMS 3.1 patch 1251

2008-01-15 (Ale)
  • Update CA (1.18-1)
2007-11-13 (Ale)
  • Sincronize WMS with patch 1251 using this repository
2007-08-29 (Ale)
  • Restarting the services
2007-08-27 (Ale)
  • Shutdown due to maintenance to electrical distribution.
2007-07-23 (Ale)
  • Added this cron-job to clean periodically LB databases (you need to fix first a silly bug on script $GLITE_LOCATION/sbin/glite-lb-export.sh):
[root@devel09 etc]# cat /etc/cron.d/lb-purger.cron
#! /bin/sh
GLITE_LB_EXPORT_BKSERVER="devel11.cnaf.infn.it"
# run every wednesday and sunday at 01:00
0 1 * * wed,sun glite . /etc/glite/profile.d/glite_setenv.sh ; $GLITE_LOCATION/sbin/glite-lb-export.sh >> /var/log/glite/lb_purger.log  2>&1
2007-07-02 (Ale)
  • Cleaning LB databases
  • Set MaxOutputSandboxSize = -1 on glite_wms.conf as workaround for bug #27215
  • Reopen the WMS to test users.
2007-06-27 (Ale)
  • Update rpms using patch #1203
    • glite-ce-cream-client-api-c (1.7.13-0 => 1.7.14-0)
    • glite-ce-monitor-client-api-c (1.7.11-0 => 1.7.12-0)
    • glite-wms-common (3.1.14-1 => 3.1.17-1)
    • glite-wms-helper (3.1.15-1 => 3.1.16-1)
    • glite-wms-ice (3.1.15-1 => 3.1.16-1)
    • glite-wms-manager (3.1.28-1 => 3.1.29-1)
    • glite-wms-wmproxy (3.1.25-1 => 3.1.26-1)
    • glite-lb-common (5.1.1-1 => 5.1.2-1)
  • Bugs fixed:
    • #27215: WM to set the maximum output sandbox size
    • #27126: generation of unique filenames in jobdir is not reliable
    • #27042: When ICE starts executes a lease update even if start_lease_updater is false
    • #26952: WMProxy server does check mandatory attributes for collection after returning jobid to client
    • #26968: There's a memory leak in a method that extract the proxy time left
2007-06-26 (Ale)
  • Stop the services to debug a problem with the LBserver.
2007-06-13 (Ale)
  • Update rpms using last available tags:
    • glite-wms-manager_R_3_1_28_1 to fix bugs #25680 and #26857
    • glite-wms-matchmaking_R_3_1_5_1 to fix bug #26913
    • glite-wms-purger_R_3_1_8_1 to fix bug #26705
    • glite-wms-configuration_R_3_1_7_1 to fix bug #26705
    • glite-wms-ice_R_3_1_15_1 to fix bugs #27042 and #26537
  • and also:
    • glite-lb-client (2.3.3-1 => 2.3.4-1)
    • glite-lb-client-interface (2.3.2-1 => 2.3.3-1)
    • glite-lb-common (5.0.3-1 => 5.1.1-1)
    • glite-lb-logger (1.4.2-1 => 1.4.3-1)
    • glite-lb-proxy (1.4.1-3 => 1.4.1-4)
    • glite-lb-ws-interface (2.3.0-2 => 2.3.0-3)
    • glite-security-lcmaps-interface (1.3.9-1 => 1.3.14-1)
    • glite-security-lcmaps-interface-without-gsi (1.3.9-1 => 1.3.14-1)
    • lcg-CA (1.13-1 => 1.14-1)
  • Create a new cron entry for the purge:
HOME=/
MAILTO=root@localhost
# Execute the 'purger' command at every day except on Sunday with a frequency of one hour
# if and only if the percentage of allocated blocks is greater than 40%
0 */6 * * mon-sat glite . /etc/glite/profile.d/glite_setenv.sh ; $GLITE_LOCATION/sbin/glite-wms-purgeStorage.sh -l $GLITE_LOCATION_LOG/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -t 604800 -a 40 > /dev/null

# Execute the 'purger' command at 4:00 AM, 8:00 AM, 12:00 noon, 4:00 PM,
# and 8:00 PM (0 */4) on each Sunday (sun).
0 */4 * * sun glite . /etc/glite/profile.d/glite_setenv.sh ; $GLITE_LOCATION/sbin/glite-wms-purgeStorage.sh -l $GLITE_LOCATION_LOG/glite-wms-purgeStorage.log -p /var/glite/SandboxDir  -t 604800 > /dev/null
2007-06-04 (Ale)
  • Preliminary tests are successful
  • Reopen the WMS to test users.
2007-06-01 (Ale)
  • Stop the services to update the rpms.
  • Starting from rpms on patch #1167 I update these rpms:
    • glite-wms-common_R_3_1_14_1 and glite-wms-configuration_R_3_1_6_1 to fix bug #26432
    • glite-wms-manager_R_3_1_27_1 to fix a compilation problem
    • glite-wms-wmproxy_R_3_1_25_1 to fix bugs #26586, #26737 and #26237
    • glite-wms-ism_R_3_1_14_1 to fix bug #26654
  • Update also glite-wms-ice_R_3_1_13_1, glite-ce-cream-client-api-c and glite-ce-monitor-client-api-c
  • create a crontab jobs to rotate wmproxy logs:
HOME=/
MAILTO=root@localhost
0 */2 * * * root /usr/sbin/logrotate -v /opt/glite/etc/wmproxy_logrotate.conf > /var/log/glite/logrotate.logs
  • On glite_wms_wmproxy_httpd.conf comment these lines
#CustomLog "|/usr/sbin/rotatelogs ${GLITE_LOCATION_LOG}/httpd-wmproxy-access_%Y-%m-%d-%H.log 50M" combined
#ErrorLog "|/usr/sbin/rotatelogs ${GLITE_LOCATION_LOG}/httpd-wmproxy-errors_%Y-%m-%d-%H:%M.log 100M"
  • and uncomment these ones:
CustomLog       ${GLITE_LOCATION_LOG}/httpd-wmproxy-access.log combined
ErrorLog       ${GLITE_LOCATION_LOG}/httpd-wmproxy-errors.log
  • Restart all the services
2007-05-29 (Ale)
  • Reopen the WMS to test users.
2007-05-28 (Ale)
  • Stop the services to update the rpms.
  • Update rpms using last available tags plus glite-wms-ice-3.1.11-1
  • A new BDII which contains a CREAM-CE (prod-ce-02.pd.infn.it:8443/cream-lsf-creamusr2) is now used: egee-bdii.cnaf.infn.it
2007-05-16 (Ale)
  • Problems with rotatelogs, the use of swap memory growth costantely
  • Changed some parameters in the wmproxy conf file /opt/glite/etc/glite_wms_wmproxy_httpd.conf :
    • LogLevel warn
    • ErrorLog "|/usr/sbin/rotatelogs ${GLITE_LOCATION_LOG}/httpd-wmproxy-errors_%Y-%m-%d-%H:%M.log 100M"
  • Restarted wmproxy service
2007-05-14 (Ale)
  • Removed old jobs and restarted services from a cleaning situation.
2007-05-10 (Ale)
  • There are some problems with WM (it stopepd many times) probably due to bugs #26213 and #26157
  • Due to the many wm restarts a lot of jobs have been submitted twice, this implies that also LM stops working (see bug #26208)
2007-05-02 (Ale)
  • Reopen the WMS to test users.
2007-04-24 (Ale)
  • Update the last rpms so that now the wms is synchronized with patch #1140
2007-04-23 (Ale)
  • Repeated the last test submitting 50 collections of 100 jobs each:
    • The jobs at the end are distributed in this way (i.e. the last CE on which the job finished its run):
      • 736 ( 14.7%) jobs submitted to the glite CE
      • 2131 ( 42.6%) jobs submitted to the cream CE
      • 2133 ( 42.7%) jobs submitted to the lcg CE
    • The results are:
      • 4979 (99.6%) Success
      • 21 ( 0.4%) Aborted (11 on the cream-ce and 10 on the glite-ce)
2007-04-20 (Ale)
  • A new test has been done:
    • submitted 31 collections of 100 simples jobs;
    • set the requirements so that only 3 CEs could be selected: a CREAM-CE, a GLITE-CE and an LCG-CE;
    • used the fuzzyrank and abilitated the resubmissions.
    • The jobs at the end are distributed in this way (i.e. the last CE on which the job finished its run):
      • 846 ( 27%) jobs submitted to the glite CE
      • 212 ( 7%) jobs submitted to the cream CE
      • 2042 ( 66%) jobs submitted to the lcg CE
    • The results are:
      • 2894 (93%) Success
      • 208 ( 7%) Aborted (149 on the cream-ce and 59 on the glite-ce)
  • Update wmproxy rpm: glite-wms-wmproxy-3.1.20-1
2007-04-19 (Ale)
  • The result of the yesterday test on submission to ice+cream is:
    • Success: 4000 (100%)
  • Changed the vomses file (/opt/glite/etc/vomses) to add new voms server and the cms VO for proxy reneval:
         "ops" "lcg-voms.cern.ch" "15009" "/C=CH/O=CERN/OU=GRID/CN=host/lcg-voms.cern.ch" "ops"
         "dteam" "lcg-voms.cern.ch" "15004" "/C=CH/O=CERN/OU=GRID/CN=host/lcg-voms.cern.ch" "dteam"
         "atlas" "lcg-voms.cern.ch" "15001" "/C=CH/O=CERN/OU=GRID/CN=host/lcg-voms.cern.ch" "atlas"
         "cms" "lcg-voms.cern.ch" "15002" "/C=CH/O=CERN/OU=GRID/CN=host/lcg-voms.cern.ch" "cms"
         "dteam" "voms101.cern.ch" "15004" "/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch" "dteam"
         "atlas" "voms101.cern.ch" "15001" "/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch" "atlas"
         "ops" "voms101.cern.ch" "15009" "/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch" "ops"
         "cms" "voms101.cern.ch" "15002" "/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch" "cms"
  • Update these rpms:
    • glite-wms-classad_plugin (3.1.5-1 => 3.1.6-1)
    • glite-wms-helper (3.1.14-1 => 3.1.15-1)
    • glite-wms-ism (3.1.12-1 => 3.1.13-1)
    • glite-wms-wmproxy (3.1.16-1 => 3.1.19-1)
  • Bug fixed:
    • #25653: The MatchMaking should handle 'DENY' prefix in the ACBR of CE/Views
    • #25610: Error in the template.sh on the WMS 3.1 causes uberftp failure
    • a problem with the dag nodes purging
2007-04-18 (Ale)
  • Change the BDII: egee-bdii.cnaf.infn.it to add also CREAM CE
  • Update the ice log level: ice_log_level = 700; and max_logfile_size = 100*1024*1024;
  • Start testing ice+cream ..... submitting simple jobs (40 collections of 100 jobs each)
2007-04-17 (Ale)
  • Do a preliminary test using only LCG-CE (Requirements = RegExp("\/blah-",other.GlueCEUniqueID)) submitting simple jobs (60 collections of 100 jobs each):
    • Success: 5863 (97.7%)
    • Aborted: 137 (2.3%) (Proxy expired)
2007-04-16 (Ale) 2007-04-13 (Ale)
  • Stop the services to update the rpms.
  • I'm removing old SandBox dir....... ...... .......
2007-04-03 (Ale)
  • The problem with the IL has been understood... the fix is coming...
  • Decremented the load limiter from 15 to 10 as request from Simone.
  • Removed the purger cron job as request from Simone.
  • Restarted all the daemons.
2007-04-02 (Ale)
  • Found a new problem with the LB Interlogger, it is under investigation by the developers.
  • Change the frequency at which the purger in cron is running: from every 1hour to every 6 hours
2007-03-23 (Ale)
  • Stop the services to update the rpms.
  • The ca_* rpms are update from 1.12-1 => 1.13-1
  • Removed unnecessary rpms (glite-wms-broker, glite-wms-manager-ns-common, glite-wms-brokerinfo-access, glite-jp-server-common and glite-jp-ws-interface)
  • Installed new tags:
    • glite-wms_R_3_1_46_1
    • glite-jdl_R_3_1_11_1
    • glite-lb_R_1_3_8_1
    • glite-security_R_3_1_35_1
  • The bugs fixed with the first two tags are:
    • #23937: WMProxy ignores DataRequirements
    • #11969: minor problem when building org.glite.wms.classad_plugin
    • #24954: refinement of the stochastic selector smooth function
    • #25003: signal handling in WMProxy
    • and the patch #1096: fixes in wms.client
  • The LB tag fixed the problem with the Interlogger.
  • Do a preliminary test using only LCG-CE (Requirements = RegExp("\/blah-",other.GlueCEUniqueID)) submitting 2000 simple jobs (20 collections):
    • Success: 2000 (100%)
  • Reopen the WMS to test users.
2007-03-20 2007-03-19 (Ale)
  • The LB Interlogger freezed again. Salvet and František are debugging the problem on the machine.
  • LB Interlogger has been restarted.
2007-03-14 (Ale)
  • WMS is now up and running correctly. The problem with LB Interlogger is again under investigation.
  • Started all LB services (glite-lb-proxy, glite-lb-logd and glite-lb-interlogd) in debug mode (i.e. add the "-d" option)
2007-03-13 (Ale)
  • There is a problem with LB Interlogger, it is under investigation. At the moment no submissions should be possible.
2007-03-12 (Ale)
  • Fixed problem with LB rpms installing tag glite-lb_R_1_3_7_2 (see patch #1061)"
    apt-get install glite-lb-server-bones
  • Install patch #1075 "list match works fine" (tag glite-wms_R_3_1_40_1):
    apt-get install glite-wms-helper glite-wms-matchmaking
  • Restarted the services
  • Do a preliminary test using only LCG-CE (Requirements = RegExp("\/blah-",other.GlueCEUniqueID)) submitting 4800 simple jobs (48 collections):
    • Success: 4795 (99.9%)
    • Aborted: 5 (0.1%)
  • Installed the tag which solve the "rank problem": glite-wms_R_3_1_41_1
    apt-get install glite-wms-common glite-wms-manager
  • Restarted WorkloadManager service
2007-03-07 (Ale)
  • Change apt source list: rpm http://goldrake.cnaf.infn.it:8080/ibrido/archives/glite_branch_3_1_0/repository . i386 noarch
  • Update LB rpms according to patch #1061 "Pull-in recent LB 3.0 bug fixes" with tag glite-lb_R_1_3_7_1:
    apt-get install glite-lb-client glite-lb-logger glite-lb-server-bones glite-lb-ws-interface glite-lb-common glite-lb-server
  • Restarted the services
  • Do a preliminary test using only LCG-CE (Requirements = RegExp("\/blah-",other.GlueCEUniqueID)) submitting 3630 simple jobs (38 collections):
    • Success: 3620 (99.7%)
    • Aborted: 10 (0.3%)
2007-03-06
  • Installed normal 3.0 WMS
  • Removed c-ares. Installed c-ares from the cern cert apt repository
  • Inserted in apt source lists the goldrake repository: rpm http://goldrake.cnaf.infn.it:8080/ibrido/archives/glite_branch_3_1_0_continuous/repository . i386 noarch
  • apt-get remove glite-WMS; apt-get remove glite-wms-manager-ns-daemon; apt-get dist-upgrade
  • replaced gacl and grid-mapfile
  • cp /opt/c-ares/lib/* /lib
  • Installed condor 6.8.4: rpm -Uvh condor-6.8.4-linux-x86-rhel3-dynamic-1.i386.rpm
  • ln -s /opt/condor-6.8.4/ condor-c (and modify all conf file condor and setenv to use this link)
  • rpm -Uvh google-perftools-*
  • Create file /etc/nospma to avoid SPMA downgrade
  • Add following lines into /opt/condor-c/local.devel09/condor_config.local:
               NEGOTIATOR_MATCHLIST_CACHING = False
               GRIDMANAGER_TIMEOUT_MULTIPLIER = 3
               SCHEDD_TIMEOUT_MULTIPLIER = 3
               COLLECTOR_TIMEOUT_MULTIPLIER = 3
               C_GAHP_TIMEOUT_MULTIPLIER = 3
               C_GAHP_WORKER_THREAD_TIMEOUT_MULTIPLIER = 3
               TOOL_TIMEOUT_MULTIPLIER = 3
               GLITE_CONDORC_DEBUG_LEVEL = 2
               GLITE_CONDORC_LOG_DIR = /var/tmp
  • To apply the patch #1026: "WMProxy memory allocation doesn't increase anymore" one needs to:
    • Add "export GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS=50" into /etc/glite/profile.d/glite_setenv.sh
    • Add "setenv GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS 50" into /etc/glite/profile.d/glite_setenv.csh
    • In /opt/glite/etc/glite_wms_wmproxy_httpd.conf, replace FastCgiConfig line with: "FastCgiConfig -restart -restart-delay 5 -idle-timeout 3600 -maxProcesses 25 -maxClassProcesses 20 -minProcesses 2 -listen-queue-depth 200 -gainValue 0.75 -killInterval 240 -updateInterval 240 -singleThreshold 15 -initial-env GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS -initial-env LD_LIBRARY_PATH -initial-env GLITE_LOCATION_VAR -initial-env GLITE_LOCATION_LOG -initial-env GLITE_LOCATION_TMP -initial-env RGMA_HOME -initial-env GLITE_SD_VO -initial-env GLITE_SD_PLUGIN -initial-env LCG_GFAL_INFOSYS -initial-env HOSTNAME -initial-env GLITE_WMS_WMPROXY_WEIGHTS_UPPER_LIMIT"
    • Add at then end of the PassEnv section of file /opt/glite/etc/glite_wms_wmproxy_httpd.conf the following two lines:
                PassEnv GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS
                PassEnv GLITE_PR_TIMEOUT
  • In the WM section of glite_wms.conf, add
               CeForwardParameters = { "GlueHostMainMemoryVirtualSize", "GlueHostMainMemoryRAMSize" };
  • cp /opt/glite/etc/lcmaps/lcmaps.db.template /opt/glite/etc/lcmaps/lcmaps.db
  • mkdir /var/glite/ice; mkdir /var/glite/icepersist_dir; chown -R glite.glite /var/glite/ice
  • in /etc/cron.d/glite-wms-wmproxy-purge-proxycache.cron, change glite_wms_wmproxy_purge_proxycache to glite-wms-wmproxy-purge-proxycache.
  • Modified some glite_wms.conf parameters to sync with cern services (see attached files)
  • rpm -Uvh glite-wms-manager-3.1.15-1.i386.rpm (from Goldrake repo)
  • Installed new wms rpms to make it dagless:
    • glite-wms-brokerinfo (3.1.3-1 => 3.2.3-1)
    • glite-wms-helper (3.1.5-1 => 3.1.8-1)
    • glite-wms-ism (3.1.10-1 => 3.1.11-1)
    • glite-wms-matchmaking (3.1.2-1 => 3.1.3-1)
    • glite-wms-wmproxy (3.1.13-1 => 3.1.14-1)
    • j2re (1.4.2_13-1.cern => 1.4.2_13-2.cern)
  • Add export GLITE_WMS_ENABLE_BULKMM="yes" in /etc/glite/profile.d/glite_setenv.(c)sh
  • Add EnableBulkMM =true in glite_wms.conf (WM section)
  • Restart all the services

-- AlessioGianelle - 13 Nov 2007

Edit | Attach | PDF | History: r16 | r4 < r3 < r2 < r1 | Backlinks | Raw View | More topic actions...
Topic revision: r2 - 2008-01-15 - AlessioGianelle
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback