SRB is the main technology
It is proposed that SRB, initially and later iRODS, be used to implement the Data Fabric. It already has most of the features required.
Storage
SRB servers will be installed at all locations providing storage to the Data Fabric. The storage may be permanent with appropriate backup policies or temporary which would be suitable for data caching. Site administrators will be able to specify which users or VOs have access to their storage resources. They will also be able to specify how much of a storage resource a user or VO can use.
Policies for assigning the initial, default, storage to users will need to be developed.
For negotiating larger amounts of storage it may be necessary for ARCS to work with the user or VO to ensure that the storage assignment is appropriate to the user's needs. It may be important that the storage is in a particular location (or locations) for example.
Ideally all storage providers will have similar data integrity and lifetime policies. There will need to be some discussion to either agree on a common policy. At the least there needs to be a survey of existing lifetime and data integrity policies amongst the storage providers have.
How a user obtains access
Users who have obtained a grid certificate will be able to request their own storage on the data fabric. It is intended that this process will be as easy to use as possible. Ideally users could simply be given the storage automatically when they are given a certificate.
In order to minimise SRB account administration as more users use the data fabric a process to keep the SRB accounts in sync with the existing ARCS VOMRS infrastructure will be implemented.
How a user interacts with the data fabric
It is proposed that most users will interact with the data fabric through a desktop client - probably the Hermes SRB client currently being developed by the Archer project. Many users may end up thinking of this client as the data fabric. ARCS will need to ensure that the client provides the right user experience.
Initial project plan
Exactly how the data fabric gets implemented depends largely on what features are currently present in the technology, what features are not present and how much work is required to add in the required features that are not present.
The initial strategy will be to set up a prototype SRB federation to answer questions like ...
- Can SRB be configured so that sites can administer their own storage resources?
- Can SRB implement quotas in the way defined in the ARCS data fabric?
- Can account administration be streamlined by automatically synchronising user credentials and VO membership from VOMRS?
- What SRB zone structure should be used? Perhaps a zone for each site for resources plus a zone to manage the user credentials.
- What naming conventions for SRB zones, domains and resources should be used in a production ARCS SRB federation.
- What features do we need to request in the Hermes client? Do we need to fork a version specifically for this data fabric?
- Try out existing command line tools and web interfaces. Are they sufficient? If not what do we need?
At the same time a survey will be conducted amongst ARCS members to determine the initial set of storage resources, and associated policies, that will be available in the data fabric. There will also be discussions with the members about how users, through ARCS, will negotiate or pay for extra storage where it is available.
A set of test cases will be developed which cover known possible use cases. These will be used to help investigate whether the prototype data fabric is working correctly. From these test cases automatic tests can be constructed to monitor the data fabric when it goes into production.
Milestones
- April 2008 - work to start implementing a prototype SRB federation to be used as a prototype data fabric.
- July 2008 - prototype, or beta, data fabric in place. Early adopters will be invited to try it out to test their use cases and provide feedback to ARCS.
- October 2008 - final definition of how the data fabric will be implemented and appear to end users. Start moving prototype over into full production.
- January 2009 - Data fabric in production. Start education and outreach of user groups in making good use of the data fabric.
- April 2009 - Bring managed file transfer service into production.
--
StephenMcMahon - 22 Apr 2008