Lobby monitor a dynamic way to watch CARC systems at work

screenshot of lobby monitor displayWhat’s going on at the University of New Mexico Center for Advanced Research Computing (CARC)? Are the systems running smoothly? Are researchers getting what they need from them? Information about the systems and their usage is now available at a glance on a television monitor installed in the CARC lobby. Two more display screens nearby highlight research projects that use CARC resources.

The CARC mission is to provide computing resources, access, and support to users and growing the community through education and outreach. The monitors allow users to see the CARC systems in action. The screen monitoring the systems and their usage is common at many high-performance computing centers, according to CARC research assistant professor Matthew Fricke, who provides support for users of the systems at his office at the Center.

CARC uses industry standard tools, the programs Ganglia and Nagios, to watch the systems, Fricke explained.

“They monitor the systems all the time and let us know right away if there is any trouble and things are going wrong. It’s an example of CARC keeping up with industry practice.”

Nagios tracks the switches, network hardware, clusters, and UPS (uninterruptible power supply) battery, which provides protection when the primary power source is lost or surges.

“Ganglia tells us everything about the clusters being used, the memory being used, temperatures, and other information about the systems. It recently showed one of the nodes on Gibbs (one of the machines) was overheating. We cleaned it and fixed it and got it going again,” Fricke said.

“It allows us to see the trends in usage across the center. It can help us see when to expand, be proactive, and anticipate problems with the system before they arise,” he added.

Fricke and CARC systems analyst Jose Sanchez are working to expand the information on the monitor and make it more dynamic for CARC visitors. The monitor will rotate among Ganglia, Nagios, and a third standard program called XDMod, which tracks individual users on the system, in an RSS feed. The information will eventually be available to CARC users online.

“They can see how much memory is on a node. You can see in real time how well you’re using the resources you selected and decide if where you’re running your jobs is the appropriate place,” Fricke said.

***

Fricke holds office hours at the Center for researchers needing help using CARC resources on Thursdays from 2 to 4 p.m. Or email help@carc.unm.edu to request an appointment. Intro to computing at CARC workshops for users will be held in February.