In a Grid context, data produced by jobs are created on the volatile disk
storage of the WN that is currently running the job. Small files are retrieved by
the user via
OutputSandBox at the submitting UI node, but longer production
files are stored at specialized nodes, the SE (Storage Elements) nodes
that are shared grid-wide.
Usually every CE supports a SE node in the same network area (CloseSE), in this way the natural destination for production files is the CloseSE. Each job before ending should check the produced files and store them at some SE, preferably the CloseSE that optimizes transmission times and grants, to a certain extent, the network availability from the CE to the SE.
User data is stored at the SE mount point with special naming conventions as explained in Section 6.2. The stored data name (Unix file) is a PFN (Physical File Name). As there is no grid-wide GID/UID couple for file ownership, each file is given an unique identifier, the GUID. Users should also define a LFN (Logical File Name) for the file. The whole file information is stored in the RC (Replica Catalogue), that is accessed to retrieve and verify data.
To optimize data access, users can replicate the files created at a WN and stored at the related CloseSE on any other SE. In this way, when a program needs data from the Grid, the file replica is available at the CloseSE of the processing WN. I will not discuss here how replicas are triggered by the middleware as this is an advanced option not foressen in this report.