System-level resource monitoring in high-performance computing environments Article

Agarwala, S, Poellabauer, C, Kong, J et al. (2003). System-level resource monitoring in high-performance computing environments . 1(3), 273-289. 10.1023/B:GRID.0000035189.80518.5d

cited authors

  • Agarwala, S; Poellabauer, C; Kong, J; Schwan, K; Wolf, M

abstract

  • Low-overhead resource monitoring is key to the successful management of distributed high-performance computing environments, particularly when applications have well-defined quality of service (QoS) requirements. The dproc system-level monitoring mechanisms provide tools both for efficiently monitoring system-level events and for notifying remote hosts of events relevant to their operation. Implemented as extension to the Linux kernel, dproc provides several key functions. First, utilizing the familiar /proc virtual filesystem, dproc extends this interface with resource information collected from both local and remote hosts. Second, to predictably capture and distribute monitoring information, dproc uses a kernel-level group communication facility, termed KECho, which implements events and event channels. Third, and the focus of this paper, is dproc's run-time customizability for resource monitoring, which includes the generation and deployment of monitoring functionality within remote operating system kernels. Using dproc, we show that (a) data streams can be customized according to a client's resource availabilities (dynamic stream management), (b) by dynamically varying distributed monitoring (dynamic filtering of monitoring information), an appropriate balance can be maintained between monitoring overheads and application quality, and (c) by performing monitoring at kernel-level, the information captured enables decision making that takes into account the multiple resources used by applications. © 2004 Kluwer Academic Publishers.

publication date

  • December 1, 2003

Digital Object Identifier (DOI)

start page

  • 273

end page

  • 289

volume

  • 1

issue

  • 3