--
DavidBannon - 23 Aug 2004
| |
|
|
Total Effort |
| Managmant Contact |
Ian Atkinson |
ian.atkinson at jcu.edu.au |
|
| Technical Contact |
Wayne Mallett |
wayne.mallett at jcu.edu.au |
100% |
| Technical Contact |
Andrew Sharpe |
? at jcu.edu.au |
25% |
| Technical Contact |
Martin Nicholls |
UQ |
25% |
| Technical Contact |
David Gwynne |
UQ |
60% |
| Reporting Contact |
Kathy Green |
green8@acmc.uq.edu.au |
|
2004 (Q1 and Q2) APAC Report Compute Grid, JCU
The purchase of a new HPC cluster solution at JCU has been delayed by some inter-institution issues (JCU and QUT). This cluster is going to be the base upon which grid software and hardware is deployed at JCU as part of the APAC Compute Infrastructure project. This delay has at least been well timed, as research carried out into grid middleware solutions has indicated that it will be best to wait until Globus 4 is released. It is generally expected that Globus 4 will be more stable and talk more universally than previous versions of Globus. This piece of middleware will form the foundation of a grid enabled system at JCU.
As part of the Compute Infrastructure project, there is a desire to move to a common job queuing system, one of either: PBSPro,
OpenPBS, or Torque. At JCU, we are currently running a version of PBSPro on our SGI Origin system. I have gained much experience in the configuration of PBSPro, including the configuration of PBSPro to work across a �grid� of computers. Several SGI machines (SGI Origin and Power Challenge and Solaris systems) have had PBSPro server installed and the systems have been configured to talk to each other. That is, users have the ability to send jobs (including data) from either machine to the other machine. We have also successfully tested queuing jobs onto both the SGI PBSPro server machines from client installations of PBSPro (on Linux machines). The PBSPro solution at JCU has been operational for about two years now and has had over 375000 jobs placed in its queues to date. The support/development of this software has consumed several weeks of my time in the past six months, especially developing a system that effectively checkpoints and is resilient to hardware failure.
A locally developed web interface assisting users in the creation of PBS scripts can be found at
https://www.jcu.edu.au/QPSF/forms/pbsscript.shtml. This interface allows users to create scripts that will satisfy the requirements of either the JCU or APAC queuing system rules.
David Bannon (
VPAC) has setup monthly Access Grid meetings that started out as being a meeting point for PBS administrators, however these meetings now have a greater focus and are the general APAC Compute Infrastructure discussion forum. As part of my work in the APAC compute infrastructure project, I have participated in all but one of these meetings (missed due to a network issue beyond my control).
Future work will involved a preparing the new JCU cluster to be installed in the �APAC configuration� � this involves working with the CI group to develop this specification. Two areas of particular concern are firewalls and cluster file systems. A test globus 2.4 cluster has been installed and these issues are being practically explored.
Topic revision: r3 - 14 Jun 2006 - 13:07:47 -
DavidBannon