next up previous contents
Next: Data Handling Examples Up: Job Examples Previous: The argexe example   Contents

A Production Example

A productiom job with heavy CPU load and several output files requires complex scripts and JDL files. When the script used to run the production job in a local environment is working well, the user must add the statements needed to run the job in a grid environment. I will not give an example here, but the guide lines to build the required files.

Note that in general, besides the file needed by the job, the following files are the minimal set for production submission.

To the above file, depending on job complexity, a set of output collecting/validating tools is highly recommended for efficient production. A good organization of production steps avoids human errors, improves data storage and statistcs collection.

As production jobs may require very long execution times, both in terms of CPU and solar time, taking in account also the time needed for the transition of the job status from scheduling to running, it is important to grant an suitable expiration time for the user certificate. Thus the user must setup the automatic proxy renewal using a MyProxy server and the commands:

myproxy-init -s <server> -t <hours> -d -n
myproxy-info -s <server> -d
myproxy-destroy -s <server> -d

The server name should be defined in the JDL file.

The production job JDL statements for a job that stores output files on a MSS SE, should define the following parameters:

  1. Executable to define the executable image, usually /bin/sh as the job script is passed as argument
  2. MyProxyServer to define the proxy server
  3. StdOutput and StdError defining the names for stdout and stderr output files
  4. InputSandbox to define the list of input files with the esecuting script, for example:
    InputSandbox = {, myconfig, my-rc-config, my-in-data,
  5. OutputSandbox defining the names for stdout, stderr
  6. Arguments defining the execution script, for example; the argument name must match the name in the InputSandbox declaration
  7. Requirements defining the list of requirements, like job image version, MSS request, maximum CPU, local disk space, memory, for example:
    Requirements = Member(other.RunTimeEnvironment,"ALICE-3.09.06") &&
    Member(other.RunTimeEnvironment,"MSS-AVAILABLE") &&
    other.MaxCPUTime>86400 && other.MinLocalDiskSpace>1800 &&

The production script in the grid environment must take care of the following items:

  1. set up the GDMP and RC configuration files according to the user VO membership; Vo variables are defined by the following statement
    eval `\$EDG_LOCATION/bin/edg-vo-env --shell=sh alice`
  2. as at present grid does not handle job logging with information about executing node, timing etc. the script should set up identifying parameters for better monitoring and statistcs executing at script startup statements of the following type:
    echo "Job-beg-at: `date`"
    echo "Job-evt-no: xxxxxx"
    echo "Job-WKnode: `hostname -f`"
  3. check that all input files are present in the WN currently running the job and take the appropriate action in case of failures
  4. create any environment variables needed by the job
  5. verify that files the job will write are not already present in the RC
  6. run the job
  7. at job end, verify that all expected output files are present and not empty
  8. find the close SE name:
    CLOSE_SE=`edg-brokerinfo getCloseSEs | cut -d " " -f 1`
    MOUNT_POINT=`edg-brokerinfo getSEMountPoint $CLOSE_SE`
  9. save the files to the close SE and register them on the RC
  10. once files are saved on the SE, remove all temporary files from the SE. Note that in our test, WN run out of local disk space due to automatic clean up failures. One point here is also a possible failure of file transfer to the SE so that the file is removed before the save operation is completed
  11. save end date for later checking and statistics:
    echo "Job-end-at: `date`"
    Use special strings to enable simple grep operation for monitoring messages sent to stdout.

next up previous contents
Next: Data Handling Examples Up: Job Examples Previous: The argexe example   Contents
luvisetto 2003-07-25