Advanced Workload Management Support for Linux
Workload management in enterprise operating systems is being increasingly driven by two requirements. On the one hand, workloads with diverse and dynamically changing resource demands are being consolidated on larger symmetric multiprocessors requiring efficient, business goal oriented workload management. On the other hand, autonomic computing initiatives are seeking to reduce the complexity and manual involvement in systems management. We argue that the goal-oriented workload managers that can satisfy these conflicting objectives require the operating system kernel to provide class-based differentiated service for all the resources that it manages. The Class-based Kernel Resource Management (CKRM) project seeks to develop Linux kernel mechanisms providing differentiated service to resources such as CPU time, memory pages, disk I/O, incoming network connections and outgoing network bandwidth based on dynamic groups of tasks called classes. The basic mechanism is to enhance, but not replace, the default Linux resource schedulers to be class aware and provide class association through a loadable policy. CKRM is an open source project (http://ckrm.sf.net) that provides its components and kernel enhancements under the GPL license. Currently kernel resource management is done in terms of tasks, address spaces and users. Moreover, except for CPU scheduling and outbound network bandwidth, the current Linux kernel differentiates access to resources based on overall system performance. CKRM aims to make the kernel more goal-oriented by allowing system administrators to define classes and the shares of each resource that a class should get. CKRM enhancements to the existing kernel schedulers then ensure that each class gets the resource share specified. This enables QoS and resource share apportionment for work that can not statically be associated with a task or user. At the same time resource consumption can be monitored at the class level and be adjusted by workload management middleware to obtain class specific goals. CKRM provides the logic to deal with resource monitoring and allocation based on classification policies that are passed to the kernel and are then enforced by the default Linux kernel enhancements that are now class aware. CKRM consists of three components: 1) Class-Aware Schedulers: These are the modifications to the various resource schedulers (CPU,mem,disk,net) to make them class aware and enforce specific per class resource usage. 2) CKRM Core: a patch that consists of the data structures for class definitions, an API to register various class aware schedulers (see above) and getting and setting resource shares and patches to the base kernel to the trigger dynamic classification (e.g. fork, exec, setuid) events. 3) Classification Engine is an external policy-driven module to classify tasks into classes and is invoked at the classification events of interest. Its only role is to define the various classes in the system and associate a task with a class. CKRM comes with a rule-based prototype, though other approaches are possible as well. The presentation is structured as follows. We will given an overview of the workload management requirements and why task oriented management features are insufficient and or add complexity. We then introduce the CKRM framework and its components to show these drawbacks are overcome. We then describe our various schedulers enhancements on how they achieve class aware fair share scheduling semantics. Finally we provide some performance numbers.
Dr. Hubertus Franke is a Research Staff Member at the IBM T.J.Watson Research Center, Yorktown Heights NY, where he currently manages the Enterprise Linux Group. His groups primary objectives is to drive enterprise level functionality towards the linux kernel. His technical interests are Operating Systems, Computer Architecture and distributed systems. In previous assignments at IBM research he contributed to the IBM SP2 supercomputer system through the implementation of the MPI message passing layer and the gang scheduling system. He received a Diplom Informatik degree from the Technical University of Karlsruhe in 1987 and a Ph.D. in Electrical Engineering from Vanderbilt in 1992.