Do not use this page, it has been migrated to the ARCS WIKI at http://wiki.arcs.org.au/bin/view/Main/ResourceStatusAvailability

APAC Grid Resource Status/Availability

Purpose - To allow Grid Applications to determine where particular resources can be accessed and to monitor the status of the resources.

Responsible Group - SAPAC

Group Members

  • Gerson Galang, SAPAC (team leader)
  • Paul Coddington, SAPAC
  • David Bannon, VPAC
  • Ben Evans, ANU
  • Ryan Fraser, CSIRO, Geosciences
  • Lyle Winton, University of Melbourne, Physics
  • Katherine Manson, University of Melbourne, Astronomy
  • Glenn Hyland, University of Tasmania, Earth Sciences

Contact

Note The status of gateways and their usage is displayed on the GOC


Resource Information, Registry and Monitoring Services

The APAC National Grid needs to provide three related services:

  • Registry services to enable discovery of what software or services are available on the grid resources.

  • Resource information services to enable users, resource brokers and applications to find out the status of grid resources, e.g. CPU architecture and CPU load, available memory, disk space and disk usage, OS, resource manager queue information, and network bandwidth.

  • Monitoring services to allow system administrators and users to monitor the current and previous usage of resources in the grid. (NOTE this is monitoring of overall resource usage, not monitoring of the status of individual jobs on the grid, which is a separate issue).

Monitoring Services

Grid monitoring applications that should be added to the NG1 image.

  1. GRASP - a Grid benchmarking tool. (not yet in VDT)
  2. Inca - a monitoring application to verify Grid software and operating system installs (not yet in VDT)
  3. MonaLISA - a framework which provides a distributed monitoring service system (already in VDT)

GRASP

The Grid Assessment Probes ( GRASP ) are designed to serve a simple grid application exemplars as well as a set of diagnostic tools. They test and measure performance of basic grid functions including file transfers, remote execution, and Grid Information Services response.

To simplify the NG1 deployment, we recommend that GRASP should not be installed yet. Installing and running GRASP is not that hard once the Grid infrastructure is already setup (Globus GRAM, MDS, and GridFTP running).

A section on how to run GRASP with NG1 installs can be found in the NG1 Config Instructions page.

Inca

Inca is a framework for automated testing, benchmarking and monitoring Grid systems. Inca will be useful for testing APAC Grid's grid deployment activities. With Inca, you can easily monitor if the applications and operating systems that have been installed and configured on the Grid conforms to the agreed-upon specifications.

Inca is currently deployed on TeraGrid, UK NGS, and Deisa.

Inca has libraries that can be used and extended to write custom reporters that we want to run on the APAC Grid. It currently has Globus (2 and 4), Cluster, and SRB tests that we can use out-of-the-box. The GRASP benchmarking tests have also been integrated with Inca and is available as a reporter. The GITS developers have also written a reporter for the GITS which they are currently using at UK NGS. We can use the reporters they've written to migrate our GITS tests so that it uses the Inca infrastructure.

Security is always a concern at automating grid tests (grid jobs) so here's a section from the Inca user guide discussing how it uses the credential and passphrase of the user running Inca.

http://inca.sdsc.edu/releases/2.0beta/html/userguide.html#PROXIES

HowIncaWorks

The DeployingInca2 page talks more about what's involved in deploying INCA on the APAC Grid.

InstallingInca2

IncaTestsOnGridAustralia

http://www.sapac.edu.au/inca

MonaLISA

MonaLISA stands for Monitoring Agents using a Large Integrated Services Architecture. The MonALISA framework provides a distributed monitoring service system using JINI/JAVA and WSDL/SOAP technologies. Each MonALISA server acts as a dynamic service system and provides the functionality to be discovered and used by any other services or clients that require such information. Here is an example of MonaLisa for Grid3.

In APAC, Monalisa will provide the following informations: - WAN links of Grid sites and the real time traffic on them - capacity of those WAN links - ABPing measurements between sites - ABPing is module used to perform simple network measurements using small UDP packages. ABPing measurements are used to provide information about the quality of connectivity among different centers as well as for dynamically computing optimal trees for connectivity (minimum spanning tree, minimum path for any node to all the others...) - Load CPU usage of all the clusters at a site - IO information of all the clusters at a site - all the monitored informations of every site in a VO - jobs of every site and states of those jobs - load distribution on every monitored site (no. of nodes with full, medium, and empty load)

MonaLISA can collect information from SNMP daemons, PBS, Ganglia, local and remote procedures to real /proc files, and custom modules that users wrote to talk with MonaLISA. In APAC, we can just concentrate on getting Monalisa working with PBS servers running on remote machines and SNMP daemons running on the gateway machines. PBS is the most common resource manager used by sites in Australia and SNMP IO is needed to give real time traffic information going through the gateway machines.

The PBS module of MonaLISA was added by the developers at our request since a number of APAC partner sites are only willing to provide cluster information if PBS will be providing the information and not another tool (Ganglia) that needs to be installed in addition to the resource manager. The module uses qstat and pbsnodes commands. The qstat command is used to get information about the jobs being run by VOs on a site while the pbsnodes command is used to provide the following information:

  • no of CPUS
  • free virtual memory
  • total memory
  • load average for 1 min
  • total nodes
  • total available nodes
  • total free nodes (isn't this the same as the total available nodes)
  • total down nodes

See link (ConfigureMonaLISA) on how to configure MonaLISA.

Other tools

We also looked at ACDC Grid Dashboard, a system which provides near real-time snap shots of critical computational job metrics, which are stored in a database and utilized by dynamic web pages that it generates for the user. ACDC is not yet available for download.

-- GersonGalang - 14 Nov 2005

Do not use this page, it has been migrated to the ARCS WIKI at http://wiki.arcs.org.au/bin/view/Main/ResourceStatusAvailability

Discussions

Topic revision: r16 - 03 Dec 2007 - 04:18:21 - GersonGalang
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback