Tags:
,
view all tags
%TOC% ---+ WMS Test Plan NOTICE: missing tests: * drain * osb truncation: MaxOSBSize * testing more LBs with "LBserver" as a vector * proxies with long chain (v-p-i -noregen) * membership of more than one VO in the certificate ---++ Unit tests N/A ---++ Deployment tests ---+++ Generic repository * epel.repo * EGI-trustanchors * sl.repo * sl-security.repo ---+++ Installation test First of all, install the yum-protectbase rpm: =yum install yum-protectbase.noarch= Then proceed with the installation of the CA certificates by issuing: =yum install ca-policy-egi-core= Install the WMS metapackage: =yum install emi-wms= After the definition of the _site-info.def_ file configure the WMS: =/opt/glite/yaim/bin/yaim -c -s site-info.def -n WMS= At the end of the installation the various init script should be checked with all parameters (start | stop | restart | status | version) (%RED%TBD%ENDCOLOR%) ---+++ Update test Starting from a _production_ WMS add the patch repository then issue: =yum update= If necessary reconfigure the WMS: =/opt/glite/yaim/bin/yaim -c -s site-info.def -n WMS= At the end of the update the various init script should be checked with all parameters (start | stop | restart | status | version) (%RED%TBD%ENDCOLOR%) ---++ Functionality tests ---+++ Features/Scenarios to be tested WMS can be deployed into two modes: 1 Using an LB server installed in the same machine (BOTH mode) 1 Using an external LB server (PROXY mode) Both scenarios should be tested. ---++++ Test job cycle (from submission to output retrieve) Submit a job to the WMS service and when finished retrieve the output; a the end the final status of the jobs should be _Cleared_. Submission can be tested using different type of proxy: * Proxy from different VO (%RED%TBD%ENDCOLOR%) * Proxy with different ROLE (%RED%TBD%ENDCOLOR%) * Delegated proxy retrieved from a !MyproxyServer (%RED%TBD%ENDCOLOR%) Different types of CE must be used: * Lcg CE * CREAM CE * ARC CE Test job submission with the following type of jobs: ---+++++ Normal Job * Test the complete cycle submitting to the two types of CE: lcg and Cream %GREEN% *Implemented.* %ENDCOLOR% More different jdls can added in the future. In particular these attributes should be tested: * !DataRequirements (with differents !DataCatalogType) (%RED%TBD%ENDCOLOR%) * !OutputData (%RED%TBD%ENDCOLOR%) * !InputSandboxBaseURI, !OutputSandboxDestURI and !OutputSandboxBaseDestURI (%RED%TBD%ENDCOLOR%) * !AllowZippedISB and !ZippedISB (%RED%TBD%ENDCOLOR%) * !ExpiryTime (%RED%TBD%ENDCOLOR%) * !ShortDeadlineJob (%RED%TBD%ENDCOLOR%) ---+++++ Perusal job Job perusal is the ability to view output from a job while it is running. %GREEN% *Implemented.* %ENDCOLOR% ---+++++ DAG job Directed Acyclic Graphs (a set of jobs where the input/output/execution of one of more jobs may depend on one or more other jobs). %GREEN% *Implemented.* %ENDCOLOR% * Also the nodes should be in state _Cleared_ More different jdls can added in the future. ---+++++ Parametric Job Multiple jobs with one parametrized description. %GREEN% *Implemented.* %ENDCOLOR% ---+++++ Collection Job Multiple jobs with a common description. There are two ways to submit collection: * you can create a single jdl with all the jdls of node %GREEN% *Implemented.* %ENDCOLOR% * you can submit all the jdls stored in a directory (bulk submission) %GREEN% *Implemented.* %ENDCOLOR% ---+++++ Parallel Job Jobs that can be running in one or more cpus in parallel. %GREEN% *Implemented.* %ENDCOLOR% * One of the charactheristics of this type of jobs is the possibility to pass these parameters directly to the CE: * WHOLENODES * SMPGRANULARITY * HOSTNUMBER * CPUNUMBER Jdls combining these parameters should be used. *%RED%TBD%ENDCOLOR%* ---++++ Delegation * There are two types of delegation: the automatic ones or you can delegate before submission. Submit jdls using both methods %GREEN% *Implemented.* %ENDCOLOR% * Make a delegation with an expired proxy. Command should fails. %GREEN% *Implemented.* %ENDCOLOR% * Submit with an expired delegation. Command should fails. %GREEN% *Implemented.* %ENDCOLOR% ---++++ Shallow and deep re-submission There two type of resubmission; the first is defined _deep_ occurs when the user's job has stardted running on the WN and then the job itself or the WMS !JobWrapper has failed. The second one is called _shallow_ and occurs when the WMS !JobWrapper has failed before starting the actual user's job. %GREEN% *Implemented.* %ENDCOLOR% ---++++ Job List-match Testing Test various matching requests %GREEN% *Implemented.* %ENDCOLOR% ---+++++ With data Test matchmaking using data requests <strong><span style="color: red;">TBD</span></strong> * You need to register a file on an SE, then try a list-match using a jdl like this one (as !InputData put the lfn(s) registered before): <verbatim> ########################################### # JDL with Data Requirements # ########################################### Executable = "calc-pi.sh"; Arguments = "1000"; StdOutput = "std.out"; StdError = "std.err"; Prologue = "prologue.sh"; InputSandbox = {"calc-pi.sh", "fileA", "fileB","prologue.sh"}; OutputSandbox = {"std.out", "std.err","out-PI.txt","out-e.txt"}; Requirements = true; DataRequirements = { [ DataCatalogType = "DLI"; DataCatalog = "http://lfcserver.cnaf.infn.it:8085"; InputData = {"lfn:/grid/infngrid/cesini/PI_1M.txt","lfn:/grid/infngrid/cesini/e-2M.txt"}; ] }; DataAccessProtocol = "gsiftp";</verbatim> The listed CEs should be the ones "close" to the used SE ---+++++ Gang-Matching If we consider for example a job that requires a CE and a determined amount of free space on a close SE to run successfully, the matchmaking solution to this problem requires three participants in the match (i.e., job, CE and SE), which cannot be accommodated by conventional (bilateral) matchmaking. The gangmatching feature of the classads library provides a multilateral matchmaking formalism to address this deficiency. Try some listmatch using different expressions of Requirements which use the =anyMatch()= function: %RED%TBD%ENDCOLOR% ---++++ WMS Job Cancel Testing Test the cancellation of these type of jobs (final status should be _Cancelled_): * Submit and cancel a normal job %GREEN% *Implemented.* %ENDCOLOR% * Submit a dag job and then cancel it (the _parent_) %GREEN% *Implemented.* %ENDCOLOR% * Submit a dag job and then cancel some of its nodes %GREEN% *Implemented.* %ENDCOLOR% * Submit a collection job and then cancel it (the _parent_) %GREEN% *Implemented.* %ENDCOLOR% * Submit a collection job and then cancel some of its nodes %GREEN% *Implemented.* %ENDCOLOR% * Cancellation of a _Done_ job should fails. %GREEN% *Implemented.* %ENDCOLOR% ---++++ Prologue and Epilogue jobs In the jdl you can specify two attributes _prologue_ and _epilogue_ which are scripts that are execute respectively before and after the user's job. %GREEN% *Implemented.* %ENDCOLOR% ---++++ Proxy renewal * Submit a long job with _myproxyserver_ set using a short proxy to both CE (lcg and CREAM). Job should finishes _Done (Success)_ %GREEN% *Implemented.* %ENDCOLOR% * Submit a long job without setting _myproxyserver_ using a short proxy to both CE (lcg and CREAM). Job should finishes _Aborted_ with reason "proxy expired" %GREEN% *Implemented.* %ENDCOLOR% ---++++ WMS feedback This mechanism avoid a job to remain stuck for long time in queue waiting to be assigned to a worker node for execution. There are three parameters in the jdl that can be used to manage this mechanism: 1 !EnableWMSFeedback 1 !ReplanGracePeriod 1 !MaxRetryCount The test should submit a lot of long jobs with short !ReplanGracePeriod using a small number of resources, at the end of the test some jobs should be replanned (i.e. reassigned to different CEs). This can be evinced from the logging info of the jobs. (%RED%TBD%ENDCOLOR%) A list-match with a jdl where _EnableWMSFeedback_ is set to true must return only =CREAM= CE ---++++ Limiter mechanism The WMS has implemented a limiter mechanism to protect himself from overload. This mechanism is based on different parameters anc can be configured inside wms configuration file. All these parameters should be checked and tested. (%RED%TBD%ENDCOLOR%) <verbatim> Usage:/usr/sbin/glite_wms_wmproxy_load_monitor [OPTIONS]... --load1 threshold for load average (1min) --load5 threshold for load average (5min) --load15 threshold for load average (15min) --memusage threshold for memory usage (%) --swapusage threshold for swap usage (%) --fdnum threshold for used file descriptor --diskusage threshold for disk usage (%) --flsize threshold for input filelist size (KB) --flnum threshold for number of unprocessed jobs (for filelist) --jdsize threshold for input jobdir size (KB) --jdnum threshold for number of unprocessed jobs (for jobdir) --ftpconn threshold for number of FTP connections --oper operation to monitor (can be listed with --list) --list list operation supported --show show all the current values </verbatim> ---++++ Purging There are differents purging mechanisms on the WMS: * SandboxDir purging done by a cron script using command: /usr/sbin/glite-wms-purgeStorage.sh * Internal purging which means removal of files used by the various daemons as temporary information store. This purging should be done by the daemons themself. * Proxy purging done by a cron script using the command: /usr/bin/glite-wms-wmproxy-purge-proxycache All these mechanism should be tested, i.e. check if all the unused files are removed at the end of the job. (%RED%TBD%ENDCOLOR%) ---++++ Configuration file The file /etc/glite-wms/glite_wms.conf is used to configure all the daemons running on a WMS. A lot of parameters should be set with this file. Almost all these parameters should be checked. (%RED%TBD%ENDCOLOR%) It should be verified that in the configuration file =/etc/glite-wms/glite_wms.conf= there are these hard-coded values: For the common section: <verbatim> DGUser = "\${GLITE_WMS_USER}" HostProxyFile = "\${WMS_LOCATION_VAR}/glite/wms.proxy" LBProxy = true </verbatim> For the JobController section: <verbatim> CondorSubmit = "${CONDORG_INSTALL_PATH}/bin/condor_submit" CondorRemove = "${CONDORG_INSTALL_PATH}/bin/condor_rm" CondorQuery = "${CONDORG_INSTALL_PATH}/bin/condor_q" CondorRelease = "${CONDORG_INSTALL_PATH}/bin/condor_release" CondorDagman = "${CONDORG_INSTALL_PATH}/bin/condor_dagman" DagmanMaxPre = 10 SubmitFileDir = "${WMS_LOCATION_VAR}/jobcontrol/submit" OutputFileDir = "${WMS_LOCATION_VAR}/jobcontrol/condorio" InputType = "jobdir" Input = "${WMS_LOCATION_VAR}/jobcontrol/jobdir/" LockFile = "${WMS_LOCATION_VAR}/jobcontrol/lock" LogFile = "\${WMS_LOCATION_LOG}/jobcontoller_events.log" LogLevel = 5 MaximumTimeAllowedForCondorMatch = 1800 ContainerRefreshThreshold = 1000 </verbatim> For the NetworkServer section: <verbatim> II_Port = 2170 Gris_Port = 2170 II_Timeout = 100 Gris_Timeout = 20 II_DN = "mds-vo-name=local, o=grid" Gris_DN = "mds-vo-name=local, o=grid" BacklogSize = 64 ListeningPort = 7772 MasterThreads = 8 DispatcherThreads = 10 SandboxStagingPath = "${WMS_LOCATION_VAR}/SandboxDir" LogFile = "${WMS_LOCATION_LOG}/networkserver_events.log" LogLevel = 5 EnableQuotaManagement = false MaxInputSandboxSize = 10000000 EnableDynamicQuotaAdjustment = false QuotaAdjustmentAmount = 10000 QuotaInsensibleDiskPortion = 2.0 DLI_SI_CatalogTimeout = 60 ConnectionTimeout = 300 </verbatim> For the LogMonitor section: <verbatim> JobsPerCondorLog = 1000 LockFile = "${WMS_LOCATION_VAR}/logmonitor/lock" LogFile = "${WMS_LOCATION_LOG}/logmonitor_events.log" LogLevel = 5 ExternalLogFile = "\${WMS_LOCATION_LOG}/logmonitor_external.log" MainLoopDuration = 5 CondorLogDir = "${WMS_LOCATION_VAR}/logmonitor/CondorG.log" CondorLogRecycleDir = "${WMS_LOCATION_VAR}/logmonitor/CondorG.log/recycle" MonitorInternalDir = "${WMS_LOCATION_VAR}/logmonitor/internal" IdRepositoryName = "irepository.dat" AbortedJobsTimeout = 600 GlobusDownTimeout = 7200 RemoveJobFiles = true ForceCancellationRetries = 2 </verbatim> or the Workloadmanager section: <verbatim> PipeDepth = 200 WorkerThreads = 5 DispatcherType = "jobdir" Input = "${WMS_LOCATION_VAR}/workload_manager/jobdir" LogLevel = 5 LogFile = "${WMS_LOCATION_LOG}/workload_manager_events.log" MaxRetryCount = 10 CeMonitorServices = {} CeMonitorAsynchPort = 0 IsmBlackList = {} IsmUpdateRate = 600 IsmIiPurchasingRate = 480 JobWrapperTemplateDir = "${WMS_JOBWRAPPER_TEMPLATE}" IsmThreads = false IsmDump = "${WMS_LOCATION_VAR}/workload_manager/ismdump.fl" SiServiceName = "org.glite.SEIndex" DliServiceName = "data-location-interface" MaxRetryCount = 10 DisablePurchasingFromGris = true EnableBulkMM = true CeForwardParameters = {"GlueHostMainMemoryVirtualSize","GlueHostMainMemoryRAMSize","GlueCEPolicyMaxCPUTime"} MaxOutputSandboxSize = -1 EnableRecovery = true QueueSize = 1000 ReplanGracePeriod = 3600 MaxReplansCount = 5 WmsRequirements = ((ShortDeadlineJob =?= TRUE) ? RegExp(".*sdj$", other.GlueCEUniqueID) : !RegExp(".*sdj$", other.GlueCEUniqueID)) && (other.GlueCEPolicyMaxTotalJobs == 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs) && (EnableWmsFeedback =?= TRUE ? RegExp("cream", other.GlueCEImplementationName, "i") : true) </verbatim> For the WorkloadManagerProxy: <verbatim> SandboxStagingPath = "${WMS_LOCATION_VAR}/SandboxDir" LogFile = "${WMS_LOCATION_LOG}/wmproxy.log" LogLevel = 5 MaxInputSandboxSize = 100000000 ListMatchRootPath = "/tmp" GridFTPPort = 2811 LBLocalLogger = "localhost:9002" MinPerusalTimeInterval = 1000 AsyncJobStart = true EnableServiceDiscovery = false LBServiceDiscoveryType = "org.glite.lb.server" ServiceDiscoveryInfoValidity = 3600 WeightsCacheValidity = 86400 MaxServedRequests = 50 OperationLoadScripts = [ jobRegister = "${WMS_LOCATION_SBIN}/glite_wms_wmproxy_load_monitor --oper jobRegister --load1 22 --load5 20 --load15 18 --memusage 99 --diskusage 95 --fdnum 1000 --jdnum 1500 --ftpconn 300" jobSubmit = "${WMS_LOCATION_SBIN}/glite_wms_wmproxy_load_monitor --oper jobSubmit --load1 22 --load5 20 --load15 18 --memusage 99 --diskusage 95 --fdnum 1000 --jdnum 1500 --ftpconn 300" RuntimeMalloc = "/usr/lib64/libtcmalloc_minimal.so" ] </verbatim> For the ICE section: <verbatim> start_listener = false start_lease_updater = false logfile = "${WMS_LOCATION_LOG}/ice.log" log_on_file = true creamdelegation_url_prefix = "https://" listener_enable_authz = true poller_status_threshold_time = 30*60 ice_topic = "CREAM_JOBS" subscription_update_threshold_time = 3600 lease_delta_time = 0 notification_frequency = 3*60 start_proxy_renewer = true max_logfile_size = 100*1024*1024 ice_host_cert = "${GLITE_HOST_CERT}" Input = "${WMS_LOCATION_VAR}/ice/jobdir" job_cancellation_threshold_time = 300 poller_delay = 2*60 persist_dir = "${WMS_LOCATION_VAR}/ice/persist_dir" lease_update_frequency = 20*60 log_on_console = false cream_url_postfix = "/ce-cream/services/CREAM2" subscription_duration = 86400 bulk_query_size = 100 purge_jobs = false InputType = "jobdir" listeneristener_enable_authn = true ice_host_key = "${GLITE_HOST_KEY}" start_poller = true creamdelegation_url_postfix = "/ce-cream/services/gridsite-delegation" cream_url_prefix = "https://" max_ice_threads = 10 cemon_url_prefix = "https://" start_subscription_updater = true proxy_renewal_frequency = 600 ice_log_level = 700 soap_timeout = 60 max_logfile_rotations = 10 cemon_url_postfix = "/ce-monitor/services/CEMonitor" max_ice_mem = 2096000 ice_empty_threshold = 600 </verbatim> It should then be verified that: * The attribute =II_Contact= of =NetworkServer= section matches the value of the yaim variable =BDII_HOST= * The attribute =WMExpiryPeriod= of =WorkloadManager= section matches the value of yaim variable =WMS_EXPIRY_PERIOD= * The attribute =MatchRetryPeriod= of =WorkloadManager= section matches the value of yaim variable =WMS_MATCH_RETRY_PERIOD= * The attribute =IsmIiLDAPCEFilterExt= of =WorkloadManager= section is =(|(GlueCEAccessControlBaseRule=VO:vo1)(GlueCEAccessControlBaseRule=VOMS:/vo1/*)(GlueCEAccessControlBaseRule=VO=vo2...= * The attribute =LBServer= of the =WorkloadManagerProxy= section matches the value of yaim variable =LB_HOST= ---++ Performance tests ---+++ Collection of multiple nodes Submit a collection of *n* (a good compromise should be 1000) nodes. (%RED%TBD%ENDCOLOR%) ---+++ Stress test Stress tests can parametrized some features: (%ORANGE%partially implemented%ENDCOLOR%) * Type of submitted job (e.g. Collections, normal, dag, parametric) * Submission frequency (i.e. number of submissions for minute) * Number of submission (i.e. duration of test) * Number of parallel submission threads (i.e. each one with a different user proxy) * With or without automatic delegation * With or without resubmission enable * With or without proxy renewal enable * Jdl description * with or without sandbox * with or without cpu computation * executable duration * with or without data transfer * etc... This could be an example of stress test%ORANGE% %ENDCOLOR% * 2880 collections each of 20 jobs (2 days of test) * One collection every 60 seconds * Four users * Use LCG-CEs and CREAM-CEs (with different batch systems) * Use automatic-delegation * The job is a "sleep random(666)" * Resubmission is enabled * Enable proxy renewal ---++ Regression tests [[http://wiki.italiangrid.org/twiki/bin/view/WMS/RegressionTest][Complete list of Rfc tests]] ---++ Nagios probe test For tests about Nagios probes see [[http://wiki.italiangrid.it/twiki/bin/view/WMS/WmsProbe#Test][here]] ---++ Note %GREEN% *Implemented.* %ENDCOLOR% means that an automatic test exists. Otherwise test must be developed and or execute by hand.
Edit
|
Attach
|
PDF
|
H
istory
:
r18
<
r17
<
r16
<
r15
<
r14
|
B
acklinks
|
V
iew topic
|
More topic actions...
Topic revision: r15 - 2012-01-23
-
MarcoCecchi
Home
Site map
CEMon web
CREAM web
Cloud web
Cyclops web
DGAS web
EgeeJra1It web
Gows web
GridOversight web
IGIPortal web
IGIRelease web
MPI web
Main web
MarcheCloud web
MarcheCloudPilotaCNAF web
Middleware web
Operations web
Sandbox web
Security web
SiteAdminCorner web
TWiki web
Training web
UserSupport web
VOMS web
WMS web
WMSMonitor web
WeNMR web
WMS Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
Edit
Attach
Copyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback