ICI data services
This document defines the current plan for ICI data services. It is intended to be used as a platform for discussion about the plan. It is intended for other service providers, software developers and some end users. Software administration documentation and user documentation will be provided elsewhere.
--
StephenMcMahon - 21 Aug 2007
Data services provided by a site
Site scratch filesystem
This is a filesystem identified as the destination for data staging. Compute nodes need to have this filesystem mounted as well as machines providing data staging services such as
GridFTP.
Sites identify a per-user space within this filesystem with the
USER_SCRATCH and other
EnvironmentVariables.
Basic data services provided by ICI to be deployed at a site
- GridFTP server attached to the site scratch filesystem. The GridFTP service may be scaled up with multiple servers allowing striped transfers. This would allow available filesystem and network bandwidth to be fully utilised.
- SRM interface – hopefully to same storage as GridFTP server. This has more functionality than GridFTP. It also acts as an endpoint for gLite services.
- MDS configured to advertise to the information service
Data services which may be optionally deployed at a site
- LDR. Used by the LIGO community.
- dCache disk pool. Used by the ILDG community. Its also an SRM interface with a flexible storage configuration which may be useful for some sites and/or projects.
- GridFTP to SRB interface service per SRB resource to be exposed like this. This allows the use of the Globus data staging directives to and from and SRB repository.
- SRB server. It is intended to provide a packages for "stand alone" SRB servers as well as servers to be installed as part of an existing "zone". Sites would set up their own zones with these and federate with others as necessary.
Centralised data services provided by ICI
- Testing and monitoring. This will probably involve INCA. Currently there are some tests at http://info.apac.edu.au/data_transfer.
- File transfer service. Currently FTS from gLite is being investigated to fill this role. It would allow managed transfers on dedicated channels (or queues).
- Replication service – implemented by Globus DRS or DQ2. Not decided yet.
Deployment of data services
For many sites the basic data services would be available in a standard virtual machine called ngdata. Sites may choose to install individual services on machines attached to data repositories or data caches collecting data off instruments.
The centralised services will be only installed in one or two locations ensuring that there is some kind of high availability solution for these services.