A Survey on Resource Management and Scheduling in Grids
Prepared for Prof. Javed I. Khan
Department of Computer Science, Kent State University
Date: May 2003
Computing, Grid Scheduling, Heterogeneous Computing, Grid Taxonomy, Grid
Survey, Internet Computing.
As computation, storage, and communication technologies steadily improve, increasingly large, complex, and resource-intensive applications are being developed both in research institutions and in the industry. It is a common observation that computational resources are failing to meet the demand of those applications. The power of network, storage, and computing resources is projected to double every 9, 12, and 18 months, respectively. Servers and storage have continued to rapidly improve their "price for performance" by leveraging new innovations and manufacturing efficiencies, and the same trend has finally taken hold for bandwidth. Scientific and other grand challenge applications such as high-energy physics, bioinformatics and other disciplines, are a driving force for developing computing infrastructure of the future. Their constantly increasing computational power requirements often cannot be met by available systems. The logical response to these changes is to move from a model based on discrete infrastructure components to a distributed computing model which fully leverages the computing capabilities of the infrastructure. Ensembles of distributed, heterogeneous resources, or Computational Grids, have emerged as a popular platform for deploying such applications. Thus resources can be shared and used efficiently to gain such high computing power which is more cost effective than having one standalone super computing infrastructure. The gain in investment over performance with using grids is tremendous over the latter. Grid technologies promise to change the way we tackle complex problems. They will enable large-scale aggregation and sharing of computational, data and other resources across institutional boundaries. And harnessing these new technologies effectively will transform scientific disciplines ranging from high-energy physics to the life sciences. In order for users from multiple organizations to make use of a computing grid efficiently and securely, they must belong to a virtual organization (VO) sharing a common goal and a common set of resources. Assigning resources, users, and applications to VOs is the fundamental Grid technical value proposition. A VO is a set of participants with various relationships that wish to share resources to perform some task.
Emergence of Internet Computing and Grids: (top)
The evolution of real automated computing started with Personal Computers and even since the developments since the past fifty years has been to increase the raw power of the isolated processor. Although their speeds reach impressive heights, it's far too low to compute complex engineering problems on any one of them. One solution to the problem of inadequate computer power is to 'cluster' multiple individual computers, which is a standard practice in Super-computing centers, research labs and the industry.
But the inherent problem with the clusters is that they remain a dedicated facility. built at a single location, which is not all pervasive. Rapid improvements in communications technologies are leading many to consider more decentralized approaches to the problem of computing power. Also it is observed that most workstations are idle and up most of the time, and is a good resource to harness. SETI@home is a project which is considered as one of the biggest breakthrough, if not the biggest, in the internet computing arena. It harnesses about half a millions PC's around the world, delivering about 1000 CPU years per day, and is the most powerful network and special purpose super computer in the world today.
Internet computing is just a special case of something much more powerful which is the ability for communities to share resources as they tackle common goals. Thus was born "Grid Technology", whose frame work we will try and explore in this paper. We will talk about the various types of Grids and their taxonomy. The major issue in Grids is Resource management and Scheduling, about which we will explore the latest in today's trends. Scientists and Engineering envision an integrated Grid in which problems of different types can be routed to the most appropriate resource: dedicated supercomputers for specialized problems that require tightly coupled processors and idle workstations for more latency tolerant, data analysis problems.
Grid Technology and Resource Management (top)
The ubiquity of the Internet as well as the availability of powerful computers and high-speed network technologies as low-cost commodity components is rapidly changing the computing landscape and society. These technology opportunities have led to the possibility of using wide-area distributed computers for solving large-scale problems, leading to what is popularly known as Grid computing. The term Grid is chosen as an analogy to the electric power Grid that provides consistent, pervasive, dependable, transparent access to electricity irrespective of its source. Such an approach to network computing is known by several names: metacomputing, scalable computing, global computing, and Internet computing and more recently Peer-to-Peer (P2P) computing.
>Hierarchical and decentralized approaches are suitable for Grid resource and operational management, usually with respects to memory and processing power, because traditional methods which use centralized policies need complete state information, which is not feasible in the Grid scenario. The Grid resource broker mediates between producers and consumers. The resources are Grid enabled by deploying low-level middleware systems on them. The core middleware deployed on producer's Grid resources support the ability to handle resource access authorization and permits only authorized users to access them. The user level and core middleware on consumer's resources support the ability to create Grid enabled applications or necessary tools to support the execution of legacy applications on the Grid. Upon authenticating to the Grid, consumers interact with resource brokers for executing their applications on remote resources. The resource broker takes care of resource discovery, selection, aggregation, data and program transportation, initiating execution on remote resources and gathering results.
Characteristics of Grids: (top)
Described below are some of the key aspects of Grid technology, which provides the end users a sense of seamless computational environment.
Stages of Scheduling: (top)
Matching and Scheduling Taxonomy (top)
An application can be considered to be made up of subtasks, which may require different architectural capabilities. Each subtask is assigned to a particular machine (matching) and the subtasks assigned to a particular machine are ordered (scheduling) such that the overall execution time of the application is minimized. The combination of matching and scheduling subtasks to machines is defined as subtask mapping. The Purdue Heterogeneous Computing Taxonomy , classifies the mapping heuristics and in the following major parts:
|Attributes of RMS||Taxonomy|
|Type of Service||Computation, Data or Service Grids|
|Machine Organization||Flat/Hierarchical cells|
|Resource Model||Fixed or Extensible|
|Namespace Organization||Relational, Hierarchical, Graph.|
|Quality of Service||Soft, Hard, None.|
|Resource Information Store||Network Directory, Distributed Objects|
|Resource discovery||Query, Agents|
|Resource Info Dissemination||Batch/Periodic, On-line/Demand|
|Scheduler Organization||Centralized, Hierarchical, De-centralized|
|Scheduling Policy||System/User Centric|
|State Estimation||Predictive, Non-Predictive|
|Re-Scheduling||Periodic, Event Driven|
Classification of Grid Scheduling  (top)
In the centralized organization, there is only one scheduling controller that is responsible for the system-wide decision-making. Such an organization has several advantages including easy management, simple deployment, and the ability to co-allocate resources. In a Grid RMS the disadvantages of this organization such as the lack of scalability, lack of fault-tolerance, and the difficulty in accommodating multiple policies outweigh the advantages. Condor utilizes a centralized scheme.
The other two organizations, hierarchical and decentralized have more suitable properties for a Grid RMS scheduler organization. In a hierarchical organization, the scheduling controllers are organized in a hierarchy. One obvious way of organizing the controllers would be to let the higher level controllers manage larger sets of resources and lower level controllers manage smaller sets of resources. Compared with the centralized scheduling this mode of hierarchical scheduling addresses the scalability and fault-tolerance issues. It also retains some of the advantages of the centralized scheme such as co-allocation. Many Grid systems such as 2K, Darwin, and Legion use hierarchical scheduler.
The Decentralized organization addresses several important issues such as fault-tolerance, scalability, site-autonomy, and multi-policy scheduling. But some of the drawbacks include management, usage tracking and co-allocation. This scheme works well considering the scalability to large networks but it would have to coordinate with each other via some form of resource discovery or resource trading protocols. Systems such as Bond, MOL and Ninf use Decentralized-scheduling approaches.
As the resource availability in the Grid changes with time, the scheduling systems need to be adaptive. This is achieved by evaluating the current schedule (state estimation) based on predictive techniques; and then developing a new schedule, i.e. rescheduling to meet the users requirements. The re-scheduling can be initiated periodically or whenever some event occurs.
Scheduling Algorithms (top)
Efficient scheduling software should minimize idle processing time. No single algorithm can achieve optimal performance on all possible job execution time distributions. However, when we know more information such as which distribution dominates the job spectrum, a scheduling system can choose which policy will be near optimal. Scheduling is said to be static when the processors on which the jobs will run are assigned at compile time or before execution. Dynamic scheduling or load balancing is performed at run time. To arrive at a scheduling decision, the scheduling system needs to take various parameters into consideration including the following:
a) Resource Architecture and Configuration
b) Resource Capability (clock speed, memory size)
c) Resource State (such as CPU load, memory available, disk storage free)
d) Resource Requirements of an Application
e) Access Speed (such as disk access speed)
f) Free or Available Nodes
g) Priority (that the user has)
h) Queue Type and Length
i) Network Bandwidth, Load, and Latency (if jobs need to communicate)
j) Reliability of Resource and Connection
k) User Preference
l) Application Deadline
m) User Capacity/Willingness to Pay for Resource Usage
n) Resource Cost (in terms of dollars that the user need to pay to the resource owner)
o) Resource Cost Variation in terms of Time-scale (like high @ daytime and low @ night)
p) Historical Information, including Job Consumption Rate
The Scheduling algorithms could use some combination of the above parameters, and is needless to say that no algorithm can achieve optimal solution considering all parameters. It is always a tradeoff, and the priority attached to a particular parameter depends on the architecture model and other criteria as discussed earlier.
This Algorithm schedules resources in a metacomputing system where tasks with various requirements are submitted from participant sites . The goal is to minimize the overall execution time of a collection of application tasks. Here, each application task is represented by a Directed Acyclic Graph (DAG). A task consists of several subtasks and the resource requirements are specified at subtask level. A subtask is ready for execution if all its predecessors have completed, and it has received all the input data needed for its execution. The input data sets are allowed to be replicated, i.e., the data set can be accessed from one or more data repositories. Additionally, a task can be submitted with QoS requirements, such as needed compute cycles, memory, communication bandwidth, maximum completion time, priority, etc.
A subtask cannot be scheduled until all its predecessors have been scheduled. Ready subtasks are considered for scheduling in order of their priorities. For generalized task scheduling there are two types of scheduling algorithms - level-by-level scheduling and the greedy approach. The application tasks are represented by DAGs where a node is a subtask and the edges from predecessors represent control flow. Each subtask has computation cost, data items to be communicated from predecessor subtasks, and data sets from one or more repositories. So that all the submitted tasks are completed concurrently, we create a hypothetical node and linked to the root nodes of all the submitted DAGs to obtain one combined DAG, as shown below :
In level by level heuristics, the combined DAG is first partitioned into 'l' levels of subtasks, where each level contains independent subtasks, i.e., there are no dependencies between the subtasks in the same level. Thus, all the subtasks in a level can be executed in parallel once they are ready. The scheduler considers subtasks in each level at a time. In a particular subtask the minimum completion time will be scheduled first in the MIN-FINISH algorithm and the maximum completion time similarly for the MAX-FINISH algorithm.
The Greedy approach is a small variation on the level-by-level scheduling, wherein we use the MIN-FINISH-ALL and MAX-FINISH-ALL algorithm, such that it checks if any of its successors are ready to be considered for scheduling and add them to the ready set. Thus we are avoiding the level dependency to some extent.
This algorithm considers a static problem definition, but in the real world, the applications could arrive in real time, i.e., dynamic in nature, non-deterministic manner. Also the resources could be may be added or removed during run time, which the algorithm does not address, and can be improved upon.
This algorithm maps data processing tasks onto heterogeneous resources and performs well on both compute-intensive as well as communication-intensive conditions . When communication between nodes becomes significant such that it becomes the bottleneck of performance, the algorithm trades parallelism for reduction of communication traffic. It takes a task graph and a resource graph as inputs and outputs the mapping from tasks to processors. A task graph models a data processing pipeline, whereas the resource graph models processors (nodes) and links (edges). In order to avoid bottlenecks, such as too many tasks on a processor or too much communication over a link, clustering of the task graph is considered. A cluster represents a set of tasks that are intensively communicating with each other.
>By dividing the requirement at each node (edge) by its corresponding computation (communication) capacity, we have the time required to service requested computation/communication at the node (edge). We call it occupancy at the node (edge).
The maximum occupancy over the entire graph gives us the time required to unit-progress all tasks. The goal is to make the maximum occupancy of the mapping as small as possible. Thus, minimizing the occupancy is equivalent to maximizing the throughput.
The basic approach in this algorithm is to linearly order tasks in such a say that tasks within a cluster are close to each other, and put tasks to processors according to this order. If a single processor is assigned to multiple tasks, they are likely to be in the same cluster, and therefore, when the tasks turn out to be communication-bound, processors can reduce communication simply by accommodating more tasks from the list. The next problem in this algorithm is when we should stop this process and go onto the next processor. This depends on a parameter 'c' (communication intensity), it is typically when the compute-load is best balanced among processors.
The sequence of operations of this algorithm is as follows: we obtain clusters using a clustering task graph algorithm, and map the clusters on processors using another algorithm. Once the mapping of tasks is statically obtained, iterative improvement is done to improve the performance of the system considering any changes in the system. Basically a set of tasks are identified that form a bottleneck, which is making the current occupancy large. Thus the algorithm is able to incrementally improve a given mapping, moving only those tasks that form the bottleneck.
This algorithm does not consider the communication induced between deleted and undeleted tasks. The computational complexity of the algorithm has not been worked out. This algorithm can be improved to automatically select the resources and maps tasks on wide area, which helps Grid application designers, develop performance-portable Grid code.
Heterogeneous Earliest Finish Time (HEFT) and Critical Path on a Processor (CPOP) are alternatives to algorithms which target only homogeneous processors/resources, and to algorithms which target heterogeneous environments which are either very complex or does not produce quality results . But the drawback of these algorithm is that they are static DAG based algorithms, which operate on a fixed number (bounded) number of processors. The HEFT algorithm selects the task with the highest rank (the tasks are ranked upward or downward as per the scheduling policy, it is the length of the longest path from a particular node to the exit node) and considers a possible insertion of each task in an earliest idle time slot between two already scheduled tasks on a processor.
The CPOP algorithm schedules the critical path nodes onto a single processor that minimizes the critical path length. It uses the upward and the downward rank for to assign a node priority. If a node is on the critical path, then it is assigned to the critical path processor else assigns it to a processor that minimizes the execution completion time.
Survey on some Grid Systems (top)
These surveys are based on the paper .
2K: A Distributed Operating System 
It provides a flexible and adaptable architecture for providing distributed services across a wide variety of platforms. 2K was intended for development and deployment of distributed service applications rather than high performance grand challenge applications. It can be considered to be a demand service Grid that uses a flat RMS organization. Resource discovery is performed through agents and its dissemination is performed on demand by injecting the mobile agents into the system. 2K uses a one level hierarchical controller for scheduling network bandwidth and a decentralized controller for all other resources. The scheduling policy in this system seems to be fixed.
Apples: A Network Enabled Scheduler
The application level scheduling (AppLeS)  primarily focuses on developing scheduling agents for individual applications on computational Grids. It uses other RMSs such as Globus, Legion, and NetSolve to execute application tasks. It uses templates that have been developed for parametric and master-slave type of applications. It follows the resource management model supported by the underlying Grid middleware systems. The scheduler is central to the application that performs mapping of jobs to resources, but the local resource schedulers perform the actual execution of application units. Thus AppLeS can be classified with a predictive heuristic state estimation model, online rescheduling and fixed application oriented scheduling policy.
Condor: Cycle Stealing Technology for High Throughput Computing.
Condor  is a high-throughput computing environment that can manage a large collection of diversely owned machines and networks. It is well known for harnessing idle computers, and can be configured to share resources. The resource agent runs on each machine periodically advertising its services to the collector. The resource requests and offers are described in the Condor classified advertisement (ClassAd) language, a query language. It can be considered as a computational Grid with a flat organization. Resource discovery is centralized queries with periodic push dissemination. The scheduler is centralized.
Darwin: Resource Management for Network Services
Darwin  is oriented towards resource management in network based equipment, but does provide mechanisms for scheduling computation in non-network nodes. An application provides an application input graph that describes the resource requirement. The input graph describes a set of end nodes and the network connections between them. Darwin can provide hard network QoS since Darwin components run in routers and can control bandwidth at the network low level using the built-in router functions. The core component of Darwin is Xena, a request broker, which performs global allocation of resources. Darwin uses a hierarchical fair service curve scheduling algorithm (H-FSC) for higher level resource allocation. The H-FSC algorithm was designed to efficiently support virtual networks for distributed applications. The RMS is organized in Darwin as a one level hierarchy because all requests are sent to a Xena request broker that interacts with its peer request brokers. The resource model is a fixed schema. Scheduling is hierarchical with non-predictive state estimation. Rescheduling is event driven and implemented by the control delegates. The scheduling policy is fixed and system oriented.
Globus: A Toolkit for Grid Computing
Globus  enables modular deployment of Grid systems by providing the required basic services and capabilities. Its emphasis is on the hierarchical integration of Grid components and their services. Globus supports soft QoS via resource reservation. The predefined Globus scheduling policies can be extended by using application level schedulers such as the Nimrod/G, AppLeS, and Condor/G. The Globus scheduler in the absence of application level scheduler has a decentralized organization with an ad-hoc extensible scheduling policy.
Legion: A Grid Operating System
Legion  is an object-based meta-system that provides the software infrastructure for a Grid. It defines an API for object interaction, but does not specify the programming language or communication protocol. It follows the hierarchical model. It supports a mechanism to control the load on hosts. It provides resource reservation capability and the ability for application level schedulers to perform periodic or batch scheduling. Legion machine architecture is hierarchical with decentralized scheduler. It supplies default system oriented scheduling policies, but it allows policy extensibility via a structured scheduling extension interface.
NetSolve: A Network Enabled Computational Kernel
NetSolve  is a client-agent-server paradigm based network enabled application server. It is designed to solve computational science problems in a distributed environment. Communications between NetSolve clients, agents, and servers are performed using TCP/IP sockets. Its agents can search for resources on a network, choose the best one available, execute the client request, and then return the answer to the user. The Netsolve system follows the service Grid model with hierarchical cell-based machine organization.
The Netsolve-agents act as an information repository and maintain the record of resources available in the network. The Netsolve agent acts as a resource broker and performs resource discovery and scheduling. Thus Netsolve has decentralized scheduler organization.
Nimrod/G: Resource Broker and Economy Grid
Nimrod/G  is a Grid resource broker for managing and steering task farming applications such as parameter studies on computational Grids. It follows a computational market-based model for resource management. Nimrod/G provides support for formulation of parameter studies, a single window to manage and control experiments, resource discovery, resource trading, and scheduling. It is used as a scheduling component in a new framework called Grid Architecture for Computational Economy (GRACE) which is based on using economic theories for a Grid resource management system. Nimrod-G follows hierarchical model in resource management. It supports resource reservation and QoS through the computational economy services of the GRACE infrastructure. The users specify QoS requirements such as the deadline, budget, and preferred optimization strategy. The Grid resource estimation is performed through heuristics and historical load profiling. Scheduling policy is application oriented and is driven by user defined requirements such as deadline and budget limitations. The load balancing is performed through periodic rescheduling.
Conclusion and other issues (top)
Grid computing is broad in its domain of application and raises research questions that span many areas of distributed computing and of computer science in general. A fundamental concern for Grid computing is security and lack of standards although the Grid Forum is working towards establishing a standard for protocols and other issues related to the Grid Technology.
Security is a fundamental concern in Grids, because the computing is all pervasive, and strict policies governing virtual organizations have to be built such that economic model is followed. Scheduling about which we have spoken all through out is the biggest concern in Grids, and we have to strike a suitable compromise and achieve economy for Grids to be successful. Coordinated resource sharing which is the goal of Grid systems, consists of not only exchanging files, but rather direct access to computers, software, data and other resources. Thus policies should be clearly and carefully planned as to what is being shared, who is allowed to share and the conditions under which sharing occurs. Fault tolerance is another issue when the number of nodes in the Grid increases. The Grid is currently being built as a concerted effort among many institutions and is already supporting leading scientific applications. Eventually it will be possible to construct increasingly realistic models of the various classes of scientific problems that we have on hand. It will provide a platform for the validation of research results in real-world scenarios.
 K. Taura, A. Chien, A Heuristic Algorithm for Mapping Communicating Tasks on Heterogeneous Resources, Proceedings of IEEE, 2000.
 H.A. James, K.A. Hawick, P.D. Coddington, Scheduling Independent Tasks on Metacomputing Systems, Proc. of Parallel and Distributed Computing Systems, 1999.
 J.B. Weissman, Scheduling Multi-Component Applications in Heterogeneous Wide-Area Networks, Proc. of IEEE 1999.
 V.D. Martino, M. Mililotti, Scheduling in a Grid Computing environment using Genetic Algorithms, Proc. of the International Parallel and Distributed Processing Symposium (IPDPS 2002).
 K. Kurowski, J. Nabrzyski, J. Pukacki, User Preference Driven Multiobjective Resource Management in Grid Environments, Proc. of CCGRID, 2001.
 K.K. Leung, N.H.C. Yung, P.Y.S. Cheung, Novel Neighborhood Search for Multiprocessor Scheduling with Pipelining, Proc. of HPC 2002.
 S.S. Vadhiyar, J.J. Dongarra, A Metascheduler for the Grid, HPDC-11 2002.
 H. Topcuoglu, S. Hariri, M.Y. Wu, Task Scheduling Algorithms for Heterogeneous Processors
 I. Foster, The Anatomy of the Grid: Enabling Scalable Virtual Organizations, Proc. of the 1st International Symposium on Cluster Computing and the Grid (CCGRID '01).
 H. Casanova, Distributed Computing Research Issues in Grid Computing, ACM SIGACT News Distributed Computing Column 8, 2002.
 K.Krauter, R. Buyya, M. Maheswaran, A Taxonomy and Survey of Grid Resource Management Systems for Distributed Computing, Software-Practice and Experience 2001.
 M. Maheswaran, H. J. Siegel, A Dynamic Matching and Scheduling Algorithm for Heterogeneous Computing Systems,
 Qing Ding, Guoliang Chen, A Benefit Function Mapping Heuristic for a Class of Meta-tasks in Grid Environments, CCGRID 2001.
 Hongtu Chen, M. Maheswaran, Distributed Dynamic Scheduling of Composite Tasks on Grid Computing Systems, IPDPS 2002.
 A.H. Alhusaini, V.K. Prasanna, C.S. Raghavendra, A Unified Resource Scheduling Framework for Heterogeneous Computing Environments, Supported by DARPA/ITO Quorum Program through the Naval Postgraduate School under subcontract number N62271-97-M-0931.
 T.D. Braun, H. J. Siegel, A Taxonomy for Describing Matching and Scheduling Heuristics for Mixed-Machine Heterogeneous Computing Systems,
 R. Buyya, Economic-based Distributed Resource Management and Scheduling for Grid Computing, Thesis Report 2002.
 N. H. Kapadia, On the Design of a Demand-based Network Computing System: The Purdue University Network-Computing Hubs, PhD Thesis, Purdue University, August 1999.
 V. Sunderam, A. Geist, J. Dongarra, and R. Manchek, The PVM Concurrent Computing System: Evolution,Experiences, and Trends, Parallel Computing Journal, Volume 20, Number 4, April 1994.
 D. Hensgen, T. Kidd, et al, An Overview of MSHN: The Management System for Heterogeneous Networks, 8th Workshop on Heterogeneous Computing Systems (HCW '99), San Juan, Puerto Rico, April 1999, invited.
 F. Kon, R. Campbell, M. Mickunas, and K. Nahrstedt, 2K: A Distributed Operating System for Dynamic Heterogeneous Environments, 9th IEEE International Symposium on High Performance Distributed Computing (HPDC'9) August 2000.
 D. Carvalho, F. Kon, et al, Management of Environments in 2K, 7th International Conference on Parallel and Distributed Systems (ICPADS-2000), Iwate Japan, July 4-7 2000.
 J. Gehring and A. Streit, Robust Resource Management for Metacomputers, 9th IEEE International Symposium on High Performance Distributed Computing, Pittsburgh, USA, 2000.
 S. Fitzgerald, I. Foster, C. Kesselman, G. von Laszewski, W. Smith, S. Tuecke, A Directory Service for Configuring High-Performance Distributed Computations, 6th IEEE Symp. on High-Performance Distributed Computing, 1997.
 I. Ekmecic, I. Tartalja, and V. Milutinovic, A survey of heterogeneous computing: Concepts and Systems, Proceedings of the IEEE, Vol 84, No 8, Aug 1996, pp. 1127-1144.
Relevant Links: (top)
Research Groups: (top)
International Workshop on Grid Computing.
Conferences on High Performance Distributed Computing.
Scope of the Survey: (top)
IEEE, ACM, Ohiolink Digital Library.
Keywords: Grid Computing, Grid Scheduling, Heterogeneous Computing, Grid Taxonomy. : Materials found are between 1998 - till date (April 2003). Although there are many algorithms, some of which are trivial modifications of distributed computing algorithms, the paper sites only two algorithms, one which does not take network bandwidth into account and the other which does.