Difference: WorkLogDevel10 (1 vs. 2)

Revision 22007-11-27 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="DevelTestbed"

Devel10 Work Log

WMS 3.1 patch 1251

Added:
>
>
2007-11-27 (Ale)
  • Updates CAs to version 1.18-1
 2007-11-12 (Danilo)
  • Start testing fix for bug 28235 installing:
    • glite-jdl-api-cpp-3.1.12-1.i386.rpm

Revision 12007-11-13 - AlessioGianelle

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="DevelTestbed"

Devel10 Work Log

WMS 3.1 patch 1251

2007-11-12 (Danilo)
  • Start testing fix for bug 28235 installing:
    • glite-jdl-api-cpp-3.1.12-1.i386.rpm
    • glite-wms-helper-3.1.16-2.i386.rpm
    • glite-wms-matchmaking-3.1.6-1.i386.rpm
2007-08-29 (Ale)
  • Restarting the services
  • Use "jobdir" instead of "filelist" for the WM queue setting on the WorkloadManager section of glite_wms.conf file:
    • DispatcherType = "jobdir";
    • Input = "${GLITE_LOCATION_VAR}/workload_manager/jobdir";
    • and creating the directory:
      • /var/glite/workload_manager/jobdir/tmp
      • /var/glite/workload_manager/jobdir/new
      • /var/glite/workload_manager/jobdir/old
2007-08-27 (Ale)
  • Shutdown due to maintenance to electrical distribution.
2007-08-21 (Ale)
  • Update rpms using the official repository at Cern
    • glite-lb-client (2.3.4-1 => 2.3.5-1)
    • glite-yaim-core (3.1.0-2.3 => 3.1.1-8)
  • Restarting services
2007-07-23 (Ale)
  • Reopen the WMS to test users.
2007-07-17 (Ale)
  • Close for update!
  • Reinstall the machine from scratch
  • Install a new WMS service following these instructions
  • The rpms installed are the ones of patch #1251
  • Intstalled glite-wms-ice-3.1.16-1 rpm
  • Added this cron-job to clean periodically LB databases (you need to fix first a silly bug on script $GLITE_LOCATION/sbin/glite-lb-export.sh):
[root@devel10 etc]# cat /etc/cron.d/lb-purger.cron
#! /bin/sh
GLITE_LB_EXPORT_BKSERVER="devel12.cnaf.infn.it"
# run every wednesday and sunday at 01:00
0 1 * * wed,sun glite . /etc/profile.d/grid-env.sh ; $GLITE_LOCATION/sbin/glite-lb-export.sh >> /var/log/glite/lb_purger.log  2>&1
2007-07-12 (Ale)
  • Update the following rpms:
    • glite-ce-cream-client-api-c (1.7.14-0 => 1.7.15-1)
    • glite-ce-monitor-client-api-c (1.7.12-0 => 1.7.13-1)
    • glite-wms-configuration (3.1.7-1 => 3.1.9-1)
    • glite-wms-ice (3.1.16-1 => 3.1.17-1)
    • glite-wms-wmproxy (3.1.26-2 => 3.1.27-1)
    • lcg-CA (1.14-1 => 1.15-1)
  • Bug fixed:
    • #27856: Multiple subscriptions for the same user can occur
    • #27724: Wms configuration files overwritten when updating rpm (glite_wms.conf)
    • #27708: WMProxy missing dependency on glite-security-lcmaps-plugins-basic
  • Restarted ICE service
2007-07-03 (Ale)
  • Set MaxOutputSandboxSize = -1 on glite_wms.conf as workaround for bug #27215
  • Restart workload manager
2007-06-29 (Ale)
  • Update rpms using patch #1203
  • The bugs that have been fixed with this update are:
    • #25680: job submission --nodes-resource option does not work for dags/collections
    • #26857: The "max-rank" selection algorithm for collection under some particular circumstances does not work properly
    • #26913: The MM does not use information about previous matches retrying the same CEs
    • #26705: The standalone purger does not work anymore, due to the lack of proxy file.
    • #27042: When ICE starts executes a lease update even if start_lease_updater is false
    • #26537: ICE Fails to build on SLC4
    • #27215: WM to set the maximum output sandbox size
    • #27126: generation of unique filenames in jobdir is not reliable
    • #27042: When ICE starts executes a lease update even if start_lease_updater is false
    • #26952: WMProxy server does check mandatory attributes for collection after returning jobid to client
    • #26968: There's a memory leak in a method that extract the proxy time left
  • Create a new cron entry for the purge:
HOME=/
MAILTO=root@localhost
# Execute the 'purger' command at every day except on Sunday with a frequency of one hour
# if and only if the percentage of allocated blocks is greater than 40%
0 */6 * * mon-sat glite . /etc/glite/profile.d/glite_setenv.sh ; $GLITE_LOCATION/sbin/glite-wms-purgeStorage.sh -l $GLITE_LOCATION_LOG/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -t 604800 -a 40 > /dev/null

# Execute the 'purger' command at 4:00 AM, 8:00 AM, 12:00 noon, 4:00 PM,
# and 8:00 PM (0 */4) on each Sunday (sun).
0 */4 * * sun glite . /etc/glite/profile.d/glite_setenv.sh ; $GLITE_LOCATION/sbin/glite-wms-purgeStorage.sh -l $GLITE_LOCATION_LOG/glite-wms-purgeStorage.log -p /var/glite/SandboxDir  -t 604800 > /dev/null
HOME=/
MAILTO=root@localhost
0 */2 * * * root /usr/sbin/logrotate -v /opt/glite/etc/wmproxy_logrotate.conf > /var/log/glite/logrotate.logs
  • On glite_wms_wmproxy_httpd.conf comment these lines
#CustomLog "|/usr/sbin/rotatelogs ${GLITE_LOCATION_LOG}/httpd-wmproxy-access_%Y-%m-%d-%H.log 50M" combined
#ErrorLog "|/usr/sbin/rotatelogs ${GLITE_LOCATION_LOG}/httpd-wmproxy-errors_%Y-%m-%d-%H:%M.log 100M"
  • and uncomment these ones:
CustomLog       ${GLITE_LOCATION_LOG}/httpd-wmproxy-access.log combined
ErrorLog       ${GLITE_LOCATION_LOG}/httpd-wmproxy-errors.log
  • Reopen the WMS to test users.
2007-06-26 (Ale)
  • Stop the services to debug a problem with the LBserver.
2007-05-31 (Ale)
  • Reopen the WMS to test users.
2007-05-30 (Ale)
  • Stop the services to update the rpms.
  • Starting from rpms on patch #1167 I update these rpms:
    • glite-wms-common_R_3_1_14_1 and glite-wms-configuration_R_3_1_6_1 to fix bug #26432
    • glite-wms-manager_R_3_1_27_1 to fix a compilation problem
    • glite-wms-wmproxy_R_3_1_25_1 to fix bugs #26586, #26737 and #26237
    • glite-wms-ism_R_3_1_14_1 to fix bug #26654
  • Update also glite-wms-ice_R_3_1_13_1, glite-ce-cream-client-api-c and glite-ce-monitor-client-api-c
  • A new BDII which contains a CREAM-CE (prod-ce-02.pd.infn.it:8443/cream-lsf-creamusr2) is now used: egee-bdii.cnaf.infn.it
2007-05-14 (Ale)
  • Update glite-wms-jobsubmission rpm using tag: glite-wms_R_3_1_60_1 (see bug #23401)
  • Reopen the WMS to test users.
2007-05-11 (Ale)
  • It passed the usual test... Success > 99%
2007-05-10 (Ale)
  • Update rpms using these tags:
    • org.glite.lb.version = glite-lb_R_1_4_5_2
    • org.glite.wms.version = glite-wms_R_3_1_59_1
  • Bug Fixed
    • #26269: JC locks the filelist without giving to WM possibility to submit new requests
    • #23401: Job failure - gethostbyname error in condor submission
    • #22795: timer-log file are not removed by LM
    • #26208: LM stopped on bad SizeFile object
    • #26157: The WM dies while processing a collection with pending nodes
    • #26213: Handling of LB errors needs to be improved
    • #25767: Purging does not work on collections.
    • #26267: The purger does not work: creation of lb context always fails.
    • #26250: Wms client timeout approach does not work properly
    • #25677: the LB cannot handle decimal numbers in the quantity field used to log resource usage
2007-05-03 (Ale)
  • Stop the services to update the rpms.
  • Change apt source list: rpm http://goldrake.cnaf.infn.it:8080/ibrido/archives/glite_branch_3_1_0_continuous/repository . i386 noarch
  • Update rpms using these tags:
    • org.glite.ce.version = glite-ce_R_1_7_13_0
    • org.glite.jdl.version = glite-jdl_R_3_1_11_1
    • org.glite.lb.version = glite-lb_R_1_4_4_1
    • org.glite.security.version = glite-security_R_3_1_38_1
    • org.glite.wms-utils.version = glite-wms-utils_R_3_1_8
    • org.glite.wms.version = glite-wms_R_3_1_58_1
  • The fixes introduced by the new LB tag are for these bugs:
    • #25872: glite-lb-bkserverd looping in malloc_consolidate()
    • #25677: the LB cannot handle decimal numbers in the quantity field used to log resource usage
  • Removed old logs files and Sandboxes
2007-04-19 (Ale)
  • Changed the vomses file (/opt/glite/etc/vomses) to add new voms server and the cms VO for proxy reneval:
         "ops" "lcg-voms.cern.ch" "15009" "/C=CH/O=CERN/OU=GRID/CN=host/lcg-voms.cern.ch" "ops" 
         "dteam" "lcg-voms.cern.ch" "15004" "/C=CH/O=CERN/OU=GRID/CN=host/lcg-voms.cern.ch" "dteam"
         "atlas" "lcg-voms.cern.ch" "15001" "/C=CH/O=CERN/OU=GRID/CN=host/lcg-voms.cern.ch" "atlas"
         "cms" "lcg-voms.cern.ch" "15002" "/C=CH/O=CERN/OU=GRID/CN=host/lcg-voms.cern.ch" "cms"
         "dteam" "voms101.cern.ch" "15004" "/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch" "dteam"
         "atlas" "voms101.cern.ch" "15001" "/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch" "atlas"
         "ops" "voms101.cern.ch" "15009" "/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch" "ops"
         "cms" "voms101.cern.ch" "15002" "/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch" "cms"
2007-04-05 (Ale)
  • Update the glite-wms-classad_plugin rpm (glite-wms-classad_plugin-3.1.5-1) to fix bug 25125: problem with FQAN VOViews.
  • Add the last fix for the InterLogger problem (glite-lb-logger-1.4.2-1).
  • Update the glite-wms-ice rpm (glite-wms-ice-3.1.9-1) to fix bug 25275: fails to build on SLC4.
  • Change the frequency at which the purger in cron is running: from every 1hour to every 6 hours
  • Do a preliminary test using only LCG-CE (Requirements = RegExp("\/blah-",other.GlueCEUniqueID)) submitting simple jobs (81 collections of 100 jobs each):
    • Success: 7851 (96.9%)
    • Aborted: 249 (3.1%) (Proxy expired)
  • Reopen the WMS to test users.
2007-04-02 (Ale)
  • Stop the services to update the rpms.
  • Update rpms using these tags:
    • glite-jdl_R_3_1_11_1
    • glite-lb_R_1_4_1_1
    • glite-security_R_3_1_35_1
    • glite-wms-utils_R_3_1_8
    • glite-wms_R_3_1_48_1
  • Change the parameter in the WorkloadManager session: EnableBulkMM = true; (i.e. the wms is now dagless as devel09)
  • The new LB is able to recognize the dagless-collection
2007-03-20 2007-03-16 (Ale)
  • Stop the services to update the rpms.
  • Give the commands apt-get update and apt-get dist-upgrade
    • the ca_* rpms are update from 1.12-1 => 1.13-1
  • Now the most interestings installed tags are:
    • glite-jdl_R_3_1_10_1
    • glite-jp_R_1_3_5_1
    • glite-lb_R_1_3_7_3
    • glite-security_R_3_1_33_1
    • glite-wms-utils_R_3_1_8
    • glite-wms_R_3_1_43_1
  • Restarted the services
  • Do a preliminary test using only LCG-CE (Requirements = RegExp("\/blah-",other.GlueCEUniqueID)) submitting 1700 simple jobs (17 collections):
    • Success: 1700 (100%)
  • Reopen the WMS to test users.
2007-03-08 (Ale) 2007-03-06
  • Installed normal 3.0 WMS
  • Removed c-ares. Installed c-ares from the cern cert apt repository
  • Inserted in apt source lists the goldrake repository: rpm http://goldrake.cnaf.infn.it:8080/ibrido/archives/glite_branch_3_1_0_continuous/repository . i386 noarch
  • apt-get remove glite-WMS; apt-get remove glite-wms-manager-ns-daemon; apt-get dist-upgrade
  • replaced gacl and grid-mapfile
  • cp /opt/c-ares/lib/* /lib
  • Installed condor 6.8.4: rpm -Uvh condor-6.8.4-linux-x86-rhel3-dynamic-1.i386.rpm
  • ln -s /opt/condor-6.8.4/ condor-c (and modify all conf files of condor and setenv to use this link)
  • rpm -Uvh google-perftools-*
  • Create file /etc/nospma to avoid SPMA downgrade
  • Add following lines into /opt/condor-c/local.devel10/condor_config.local:
               NEGOTIATOR_MATCHLIST_CACHING = False
               GRIDMANAGER_TIMEOUT_MULTIPLIER = 3
               SCHEDD_TIMEOUT_MULTIPLIER = 3
               COLLECTOR_TIMEOUT_MULTIPLIER = 3
               C_GAHP_TIMEOUT_MULTIPLIER = 3
               C_GAHP_WORKER_THREAD_TIMEOUT_MULTIPLIER = 3
               TOOL_TIMEOUT_MULTIPLIER = 3
               GLITE_CONDORC_DEBUG_LEVEL = 2
               GLITE_CONDORC_LOG_DIR = /var/tmp
  • To apply the patch #1026: "WMProxy memory allocation doesn't increase anymore" one needs to:
    • Add "export GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS=50" into /etc/glite/profile.d/glite_setenv.sh
    • Add "setenv GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS 50" into /etc/glite/profile.d/glite_setenv.csh
    • In /opt/glite/etc/glite_wms_wmproxy_httpd.conf, replace FastCgiConfig line with: "FastCgiConfig -restart -restart-delay 5 -idle-timeout 3600 -maxProcesses 25 -maxClassProcesses 20 -minProcesses 2 -listen-queue-depth 200 -gainValue 0.75 -killInterval 240 -updateInterval 240 -singleThreshold 15 -initial-env GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS -initial-env LD_LIBRARY_PATH -initial-env GLITE_LOCATION_VAR -initial-env GLITE_LOCATION_LOG -initial-env GLITE_LOCATION_TMP -initial-env RGMA_HOME -initial-env GLITE_SD_VO -initial-env GLITE_SD_PLUGIN -initial-env LCG_GFAL_INFOSYS -initial-env HOSTNAME -initial-env GLITE_WMS_WMPROXY_WEIGHTS_UPPER_LIMIT"
    • Add at then end of the PassEnv section of file /opt/glite/etc/glite_wms_wmproxy_httpd.conf the following two lines:
                PassEnv GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS
                PassEnv GLITE_PR_TIMEOUT
  • In the WM section of glite_wms.conf, add
               CeForwardParameters = { "GlueHostMainMemoryVirtualSize", "GlueHostMainMemoryRAMSize" };
  • cp /opt/glite/etc/lcmaps/lcmaps.db.template /opt/glite/etc/lcmaps/lcmaps.db
  • mkdir /var/glite/ice; mkdir /var/glite/icepersist_dir; chown -R glite.glite /var/glite/ice
  • in /etc/cron.d/glite-wms-wmproxy-purge-proxycache.cron, change glite_wms_wmproxy_purge_proxycache to glite-wms-wmproxy-purge-proxycache.
  • Modified some glite_wms.conf parameters to sync with cern services (see attached files)
  • Restart all the services

-- AlessioGianelle - 13 Nov 2007

 
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback