APAC Grid Resource Status/Availability
Purpose - To allow Grid Applications to determine where particular resources can be accessed
and to monitor the status of the resources.
Responsible Group -
SAPAC
Group Members
- Gerson Galang, SAPAC (team leader)
- Paul Coddington, SAPAC
- David Bannon, VPAC
- Ben Evans, ANU
- Ryan Fraser, CSIRO, Geosciences
- Lyle Winton, University of Melbourne, Physics
- Katherine Manson, University of Melbourne, Astronomy
- Glenn Hyland, University of Tasmania, Earth Sciences
Contact
Note The status of gateways and their usage is displayed on the
GOC
Resource Information, Registry and Monitoring Services
The APAC National Grid needs to provide three related services:
- Registry services to enable discovery of what software or services are available on the grid resources.
- Resource information services to enable users, resource brokers and applications to find out the status of grid resources, e.g. CPU architecture and CPU load, available memory, disk space and disk usage, OS, resource manager queue information, and network bandwidth.
- Monitoring services to allow system administrators and users to monitor the current and previous usage of resources in the grid. (NOTE this is monitoring of overall resource usage, not monitoring of the status of individual jobs on the grid, which is a separate issue).
Monitoring Services
Grid monitoring applications that should be added to the NG1 image.
- GRASP - a Grid benchmarking tool. (not yet in VDT)
- Inca - a monitoring application to verify Grid software and operating system installs (not yet in VDT)
- MonaLISA - a framework which provides a distributed monitoring service system (already in VDT)
GRASP
The Grid Assessment Probes (
GRASP ) are
designed to serve a simple grid application exemplars as well as a set of
diagnostic tools. They test and measure performance of basic grid functions
including file transfers, remote execution, and Grid Information Services
response.
To simplify the NG1 deployment, we recommend that GRASP should not be installed
yet. Installing and running GRASP is not that hard once the Grid infrastructure
is already setup (Globus GRAM, MDS, and
GridFTP running).
A section on how to run GRASP with NG1 installs can be found in the
NG1 Config Instructions page.
Inca
Inca is a framework for automated testing, benchmarking
and monitoring Grid systems. Inca will be useful for testing APAC Grid's grid deployment activities. With Inca, you can easily monitor if the applications and operating systems that have been installed and configured on the Grid conforms to the agreed-upon specifications.
Inca is currently deployed on TeraGrid, UK NGS, and Deisa.
Inca has libraries that can be used and extended to write custom reporters that we want to run on the APAC Grid. It currently has Globus (2 and 4), Cluster, and SRB tests that we can use out-of-the-box. The GRASP benchmarking tests have also been integrated with Inca and is available as a reporter. The GITS developers have also written a reporter for the GITS which they are currently using at UK NGS. We can use the reporters they've written to migrate our GITS tests so that it uses the Inca infrastructure.
Security is always a concern at automating grid tests (grid jobs) so here's a section from the Inca user guide discussing how it uses the credential and passphrase of the user running Inca.
http://inca.sdsc.edu/releases/2.0beta/html/userguide.html#PROXIES
HowIncaWorks
The
DeployingInca2 page talks more about what's involved in deploying INCA on the APAC Grid.
InstallingInca2
IncaTestsOnGridAustralia
http://www.sapac.edu.au/inca
MonaLISA
MonaLISA stands for Monitoring
Agents using a Large Integrated Services Architecture. The MonALISA framework
provides a distributed monitoring service system using JINI/JAVA and WSDL/SOAP
technologies. Each MonALISA server acts as a dynamic service system and provides
the functionality to be discovered and used by any other services or clients
that require such information. Here is an example of
MonaLisa for Grid3.
In APAC, Monalisa will provide the following informations:
- WAN links of Grid sites and the real time traffic on them
- capacity of those WAN links
- ABPing measurements between sites - ABPing is module used to perform simple
network measurements using small UDP packages. ABPing measurements are used to
provide information about the quality of connectivity among different centers
as well as for dynamically computing optimal trees for connectivity (minimum
spanning tree, minimum path for any node to all the others...)
- Load CPU usage of all the clusters at a site
- IO information of all the clusters at a site
- all the monitored informations of every site in a VO
- jobs of every site and states of those jobs
- load distribution on every monitored site (no. of nodes with full, medium, and
empty load)
MonaLISA can collect information from SNMP daemons, PBS, Ganglia, local and
remote procedures to real /proc files, and custom modules that users wrote to
talk with MonaLISA. In APAC, we can just concentrate on getting Monalisa
working with PBS servers running on remote machines and SNMP daemons running on
the gateway machines. PBS is the most common resource manager used by sites in
Australia and SNMP IO is needed to give real time traffic information going
through the gateway machines.
The PBS module of MonaLISA was added by the developers at our request since
a number of APAC partner sites are only willing to provide cluster information
if PBS will be providing the information and not another tool (Ganglia) that
needs to be installed in addition to the resource manager. The module uses qstat
and pbsnodes commands. The qstat command is used to get information about the
jobs being run by VOs on a site while the pbsnodes command is used to provide
the following information:
- no of CPUS
- free virtual memory
- total memory
- load average for 1 min
- total nodes
- total available nodes
- total free nodes (isn't this the same as the total available nodes)
- total down nodes
See link (
ConfigureMonaLISA) on how to configure MonaLISA.
Other tools
We also looked at ACDC Grid Dashboard, a system which provides near real-time
snap shots of critical computational job metrics, which are stored in a database
and utilized by dynamic web pages that it generates for the user. ACDC is not
yet available for download.
--
GersonGalang - 14 Nov 2005
Discussions