In the first job examples, jobs make use of local files only. In real usage, jobs are supposed to produce files that must live for longer periods on permanent data storage. In the grid, a permanent storage node is called SE (Storage Element). SE nodes may support disk and/or tape (MSS library) space and allow data archiving as a long term or permanent facility, depending on local policy and VO agreement.
MSS access depends on the MSS system supported, that must offer a disk-like
interface to be integrated with the grid tools. A MSS SE is selected either
by name with direct access or by using special configuration files.
Data access is handled by several grid tools:
gdmppackage for grid data mirroring
edg_rcreplica catalog tools
globus-url-copylow level command
globus-job-runcommand for remote file system queries
edg-brokerinfoto collect submission information stored by the RB
As grid operates distributed computing, network optimization is part of the grid architecture, that implements the close SE concept. Each CE must publish a close SE that is defined as the node with the best access in term of timing for data retrieval. Therefore, when a job asks for SE storage, the matching process searches for the SE where data are stored and submits the job to the CE that supports that SE as its close one. When a job writes data, the files are written on the WN storage. If the job must store the produced data on permanent storage, the job script must find out the close SE and copy the files from the WN disk to the SE storage. SE data are registered in the RC (Replica Catalogue) for future read operations.
To improve distributed computing, grid data handling is based on data replicas so that more than one CE are available for SE data access. When the job is terminated and output data stored on the close SE are validated, the user should propagate the produced files to other SEs by replicating and registering files in the RC.