G
rid
Scheduling Algorithms Tesbed
I would like to express my sincere gratitude for the assistance and support of a number of people who made this project a success.
I am very thankful to Prof. Mrs. V.Y. Kulkarni, Department of Computer Engineering, Maharashtra Institute of Technology, my internal guide for her valuable technical guidance and suggestions that she provided us at various stages throughout the project work.
I am also thankful to Prof. Mr. A.K. Pathak, Head of Department of Computer Engineering, Maharashtra Institute of Technology, for encouraging me at all times and giving important suggestions.
I would like to thank Prof. Dr. R.K. Shaymasundar (Dean STCS, TIFR) for allowing me to undertake the project at STCS, TIFR. I would like to thank Dr. Basant Rajan (Veritas) for helping me with the initial idea of the project, Mr. Anbalagan S. (ex TIFR) for the initial guidance regarding the project work , Mr. Sachin Wasnik (ex TIFR), Mr. S. Madhu (Scientific Officer, STCS, TIFR) for the very important guidance and support. I would like to thank Mr. N. Kaushik Kumar (Project Assistant STCS, TIFR) for his valuable guidance and support throughout the project work at TIFR.
My
special thanks to the non-teaching staff of Maharashtra Institute of
Technology, who provided us access to the lab facilities from time to
time.
The Big Bang, the only physical phenomenon that can be used as a yardstick to measure the explosive increase in the popularity of the Internet. Also the ensuing data explosion is of magnitude of the same order. An individual datum when in isolation and not operated upon presents no value at all. To use the datum we need to process it, the processing of such enormous data (as of now easily available because of the Internet) is the solution offered by the Grid Computing. Grid Computing is aiming to provide with enormous computing power at the hands of the users, for various needs. The tremendous increase in the data storage capacities, processing facilities and information carrying capabilities of the interconnecting networks make it the computing platform of the future.
There is a phenomenal increase in the interest for research in the field of Grid Computing. The core areas being Security, Resource Scheduling, Data Storage, Data Sharing, Accountability. Of these core areas the most important is the “Resource Scheduling” for dynamic grid environments. There is enormous amount of research going on in this particular branch of Grid Computing. But the bane of scientists is the lack of tools or testbeds for the evaluation of policies or algorithms they design.
This project is aimed to fulfill the gap between the discovery and implementation of more efficient scheduling strategies for the Grid. The Grid environment being fundamentally different from the environment of the traditional schedulers, specialized softwares are required to correctly simulate it. This project aims at providing with such a tool for the evaluation of different Grid Scheduling algorithms.
One of the objectives of computational grids is to offer applications the collective computational power of distributed but typically shared heterogeneous resources. Unfortunately, efficiently harnessing the performance potential of such systems (i.e. how and where applications should execute on the grid) is a challenging endeavor due principally to the distributed, shared and heterogeneous nature of the resources involved. This project tries to propose a tool to aid the design and evaluation of scheduling policies suitable for efficient execution of parallel applications on computational grids.
The objective of computational grids is to coordinate shared, distributed heterogeneous resources to work as a single computational resource. The availability of such an environment opens new horizons for research in areas previously unexplored or limited for economic or impractical reasons. The existence of long distance, low cost, high performance networks is encouraging the development of applications which take advantage of geographically dispersed resources. However, relatively few applications exploit the computational power available from such environments efficiently. Due to the diversity of resources, their dynamic behavior and the instability generally encountered in grids, developing applications capable of executing efficiently in such environments is still a challenge. Users of parallel system frequently complain of the difficulty of achieving a reasonable fraction of the theoretical peak performance from the systems they use. In environments like grids with so much variation, it is extremely difficult for users (typically scientist and engineers with little system knowledge) to decide, for each application and for each execution, which resources would be the most appropriate. The challenge is to execute applications efficiently and robustly in grid environments, without placing this burden on the programmer or the user.
While there have been several proposals of high performance global computing systems, scheduling schemes for the systems have not been well investigated. The reason is difficulties of evaluation by large-scale benchmarks with reproducible results. This Chatak Megh Samvad (CMs hereafter) performance evaluation system would allow analysis and comparison of various scheduling schemes on a typical high-performance global computing setting. CMs can simulate various behaviors of global computing systems, especially the behavior of networks and resource scheduling algorithms. The behavior of the network and the resource Scheduling Algorithms is faithfully carried out by the Condor HTC software. The scheduling algorithm used is the UP DOWN Scheduling Algorithm for the resource scheduling.
High performance global computing systems fueled by the rapid progress of high-speed networks and computing resources are now regarded as the computing platform of the future. In order to effectively employ computing resources therein, most proposed global computing systems embody a resource scheduling framework whose components monitor the global computing environment and predict availability of the resources. For effective investigation and objective comparison of scheduling algorithms and the implementation of the scheduling frameworks, large-scale benchmarks with reproducible results under various environments parameterized by the following constituents over time are required:
Servers --- architecture, performance, load and variance.
These parameters have been included in the scheduling scheme, to be detailed later in the architecture of the CMs project. These factors of machines influence the scheduling decisions made by the CMs scheduler.
However, reproducibility over a wide-area network is extremely costly to achieve, if not impossible. Thus, currently it is unrealistic to compare the different scheduling
algorithms proposed by other researchers, let alone compare the systems themselves. Cost and scale of possible benchmarks are also extremely limited. The resulting lack
of impartial comparative approaches is a great hindrance to global computing research and deployment. In order to resolve this situation, we are building a performance evaluation system that would allow analysis and comparison of various global computing systems under reproducible, controlled environments, called CMs. The current version of CMs mainly focuses on the evaluation of different scheduling algorithms and schemes based on a canonical (orthodox) model of high-throughput global computing system. The CMs Project tries to emulate the Grid as far as it can and give the researcher the results of the tests reliably aping the way scheduling scheme would work in a real world grid computing environment. The main aim is to prove that a scheduling scheme can be implemented in such a way, trying to divert the job level scheduling to a underlying software such as Condor and retains the application level scheduling.
The aim of this project is to facilitate the working of the grid by providing a facility to the Grid. That is an agent based scheduler and the other is to enable the grid to work across firewalls in an efficient and secure manner. The project is to design a test bed for the implementation of Grid Scheduling Algorithms. The primary requirement here is a “manageable grid”. Manageable in the sense that the expense of setting the grid up and routine maintenance is to be very less. Generally in the industrial environment the Grid spans multiple departments and is very complex and hard to maintain owning to the various factors involved in the proper functioning of it. So for this project I have chosen a High Throughput Computing facilitator “Condor”. It is non production environment software. What exactly we are exploiting here is, first the ability of Condor to migrate jobs from overloaded machines to less loaded ones using the mechanism known as “Flocking”. Secondly the ability of condor to get reconfigured in the runtime so as to facilitate the dynamic provision of less loaded machine in the grid. This basically will be a Long term scheduler for the grid.
The main objective of the project is to, build a grid scheduling algorithm test bed. The test bed is to provide the following features for the grid:
Maintenance of a host list.
Calculation of the host list according to the scheduling algorithm specified.
Dispensing of the hosts to the requesting clients based on the ranks.
Reconfiguring the configuration file of the condor hosts and issue the reconfiguration command to the respective condor masters. To enable them to flock to the less loaded machines.
Scanning the network for more available condor masters and updating the host list.
Accepting the host entries for interested condor masters and updating the host list.
The working and the architecture of the Test Bed is covered in depth in the System Architecture part of this report.
The project aims at facilitating the grid computing environment by providing with an agent based scheduler for the proper and efficient distributing of machines. The most primary assumption of the project is based on a very important facility provided by the Condor HTC software, i.e. Flocking between various masters on the grid. The proof of concept has been successfully established that the jobs do migrate from the overloaded master to the more lightly loaded one. This is done by setting various parameters of the Condor on the participating machines.
The project basically aims at automating the process of flock configuration based on some predetermined scheduling algorithm. Thus providing a test bed for the execution of grid scheduling algorithms. The construction of a grid based on the Condor HTC components is the most efficient with respect to time factor and requirements, amongst the various options considered (Globus, OSCAR, ROCKS).
This states the project is technically feasible.
The Condor HTC Project is basically Open Software and tough the code is not released yet. The various flavors of Condor are available for free download from the Project website www.cs.wisc.edu/condor/downloads. It can run various types of jobs categorized into different Worlds (Vanilla (Simple Jobs), MPI (Message Passing Interface), PVM (Parallel Virtual Machine)).
The Condor software is available of no charge for various operating systems. The machine requirements are minimum and no need of special hardware or special interconnect devices is needed. The whole grid is based on Linux Platform and hence the licensing issues are dealt with the Free Software Foundation Standards, there is no need of proprietary software to use with the system.
The project is economically feasible in all respects.
The platform used for the project is Redhat Linux 7.3 (CERN), Redhat 9 and Debian Sarge. All these software are available under the GNU GPL (General Public License; described in the User’s Manual). And the Condor software is available free for Academic use, under its own special license (Provided in the Licenses Section.). The programs are also developed and compiled fully under the Debian Operating System.
Thus the project has fully satisfied its license requirements, so the project is legally feasible.
The team of 1 person is working on this project
Chaitanya Hazarey
Minimum 3 Pentium-IV 1+ GHz
100Mbps fast Ethernet.
Minimum 128MB RAM
Minimum 30 GB HDD
Redhat Linux CERN Version (7.3)
Redhat Linux 9
Debian Sarge
Condor HTC Software (6.6.8)
MPICH Library 1.2.5 for MPI World Program
Ssh for data transfer betweens two nodes.
Mpicc to compile MPI programs.
condor_complie for compiling C Language Program.
VIM editor to write the program.
Anjuta for the compilation and managment of Project
Gnuplot to plot the graph of the tests conducted.
EtherApe for the display of grid activity
Lex Lexical Analyzer generator.
Kate Advanced Text editor.
This section provides cost, effort, and time estimates for the project of building the Grid Scheduling Algorithms Testbed.
Grid Computing is an emerging field. Most of the work carried out in this field is experimental and is going on live at various Institutes and Universities. This type of project was my own idea, and finds few parallels in current situation. Almost none deploy Condor at the Grid Computing Level (refer to the System Architecture). We couldn’t find any existing application working on this, so Estimation based historical data cannot be done.
Three Estimation techniques have been used to generate three independent results for higher accuracy.
Problem-based
Process-based
Empirical-based a COCOMO Model
LOC and FP are used in two ways during software project estimation:
As an estimation variable to size each element of software
As baseline metrics.
|
Functions/Module |
Estimated Line Of Code (LOC) |
|
Megh Daemon |
1032 |
|
Chatak Daemon |
940 |
|
host_job.sh Shell Script Test Job |
17 |
|
cvh_pass.c Test Job |
89 |
|
tiny_mc.c Monte Carlo Test Job |
62 |
|
time_mc.c Monte Carlo Test Job |
128 |
|
small_mc.c Monte Carlo Test Job |
125 |
|
pi_calc.c Pi Calculation Test Job |
690 |
|
Total Line Of Code |
3083 |
The estimates for LOC are plugged into the COCOMO formula for effort and duration estimation.
The basic COCOMO Model is used, for which,
Effort E = a KLOC b
Duration D = c E d
a = 2.4, b = 1.05, c = 2.5, d = 0.38
E = 2.4 (KLOC) 1.05
E = 2.4 (3.803) 1.05
E = 9 person-months (approx)
D = 2.5 (E) 0.35
D = 2.5 (9) 0.35
D = 5 months (approx)
N = E / D
N = 1.8 (approx)
The above estimate indicates that for 1 member, it will take 9 months approx to finish the project.
The application aims towards simulating a Grid Computing System and also to build a model scheduler based on that to evaluate the performace of the various scheduling algorithms. It is heavily dependant on the underlying Interconnecting Network. If the Network or one of its components fails the application may fail to run.
The system tries to give suggestions to the Condor where to migrate or flock , surplus jobs to , this suggestions are in turn accepted by Condor . As Codnor does not provide with any APIs or standard interfaces, it is difficult to predict exactly when the jobs will migrate. This may introduce an error in calculations.
At the time of power failure or for some other cases like failure in the network device or logical connection between the nodes the currently running process will get checkpointed on the submitting machines and will stop the execution of the remaining processes running on different nodes.
There are different versions of Condor for different platforms. There are frequent changes in these versions. So you must make sure that your application is compatible to these versions.
You have to configure Condor HTC software as a root. So some small mistake might cause improper installation of software.
When you are running applications over the LAN, then there should be Condor
utility installed on each node in the Grid. If there is no Condor installed the application will not work. But this is marginal. You can install Condor very easily. This is explained in the user manual for Condor.
The Grid environment is heterogeneous consisting of many platforms. The nuances of all the platforms have not been tested yet. It may introduce some errors in the functioning of the software.
Risks |
Category |
Probability |
Impact |
|
Threads may not interact as expected |
Development Environment |
50% |
Catastrophic |
|
Lack of Hardware or Software |
Technology to be built |
30% |
Critical |
|
Lack of Knowledge |
Staff size and experience |
50% |
Critical |
|
Technology may not meet exceptions |
Technology to be built |
20% |
Marginal |
|
Schedule will be tighten |
Business Impact |
50% |
Marginal |
|
Bugs during Test |
Staff size and experience |
20% |
Negligible |
|
Risk |
Mitigation |
Monitoring |
Management |
|
Lack of knowledge |
Study topic in detail. If many topics to be studied divide work among team members. |
Technical inability to write certain code. |
Check for regular code status and ensure its development on time. |
|
Unavailability of resources. |
Try to find out substitute. Discussion with management. |
Unable to proceed. |
Discuss problem. If no substitute found, try to get resources as early as possible |
|
Schedule is tighten |
Try to speed up project |
Milestones are not completed as per deadline establish |
Check for completion of milestones. Weekly review. |
|
Technology may not meet expectations |
Properly understand the method to install and to make changes according to the application |
Inability to read the document of the software before using it |
Read the document carefully. Try to install minimum things and test them step by step to make application running properly. |
Besides this, the project manager (Internal & External Guides) need to review, talk individually and try to find out actual status of the project. Ensure that the project documents are completed as per guidelines.
Changes may be done in the project code if required under following conditions
Change in standard.
Thread Functions don’t work as specified. It may require the modifications in the functions code.
Some extra features are to be included.
Customer may ask for some modifications in the Scheduler system.
Changes in the Condor Configuration file may require changes in the Lexical Scanner Module.
Change impact should keep in mind while designing the software so as to reduce the risk. Function modification and addition of features is considered well in advance. Code is kept modular to incorporate these changes. This reduces the risk and effect cost marginally.
The application extensively uses POSIX compliant threads which are well designed and extensively supported across platforms.
The application uses the POSIX compliant network interfaces such as sockets which are well designed interfaces and are extensively supported across platforms.
The Condor HTC software is developed by the Condor Project, Condor Team, Computer Sciences Department, University of Wisconsin-Madison.
It is a reliable software and is constanly being upgraded to remove the bugs found.
The Condor Team is comprised of the developers and administrators of Condor at the University of Wisconsin-Madison.
Users and are represented from US and Europe.
The Condor project started when two separate projects fused together their achievements. Professor D. Dewitt, R. Finkel, and M. Solomon were working on the Remote-Unix (RU) project while Professor M. Livny was working in the area of Distributed Resource Management (DRM).
This software is becoming more and more architecture independent. It is also easy to install and use.
Optimization is taken in account in each step of the way.
Documentation is provided for each functionality of Condor along with separate documentation of how to use the whole application.
There is a mailing list for Condor (Condor Users Mailing list ) which is very active in discovering flaws in the working and optimization of the Software.
Condor is being used by more and more people for deployment of Grid in their organizations either on dedicated machine beds or on shared machine environment.
The user base for the software is very large.
System is aimed at facilitating the grid computing environment; it also tries to improve the functioning of the grid computing environment. This is done by trying to customize the distribution of jobs on the grid.
It tries to exploit all the available idle resources on the grid, by migrating jobs from heavily loaded machines to machines which are lightly loaded.
The application also scans the network for hosts, the hosts which want to execute some extra jobs, and regularly updates the host-list.
The system is designed as per the POSIX standards and is compatible with Linux and Unix Operating systems.
The Megh is to be deployed on a Linux / UNIX machine, but the agents (clients) can be deployed on Linux, UNIX, Windows machines. As the communication between them is based on standard Network Calls.
The Megh will listen on a specific port, the agents will connect to this port for the sending of the request strings and also receiving of the host addresses and the command strings.
The Agents (Chataks) to be deployed on the Condor Masters throughout the system will communicate with the Megh and, will take the required actions as command strings specifies.
There are a few important things about the state of modern computing that need to be thought about before one can really understand what the Grid is, and what potential benefits it holds. Today I write this project report from home on a PC with a 2.3 GHz CPU, 256 MB of RAM, 30 Gigabyte Hard-drive, not to mention a massively parallel vector math processor (known as a graphics card) with its own 64 MB of memory. A computational arsenal that would have cost millions of dollars and filled a room only a couple of years ago, but which is now commonplace if not even a little behind the times. For decades Moore’s law has been followed almost to the letter, and we have quickly emerged into a time of abundant computational power scattered over millions of computers all over the world. However there lies a great inefficiency in this organization of resources. The vast majority of computers spend most of their time in idle. There has been an excellent study conducted on this by Matt Mutka and Miron Livny in their paper “Profiling Workstations’ Available Capacity for Remote Execution”. This is cause for the effect that their computational potential goes to waste. It has been found that a computer is used, on average, around thirty percent of the time in a commercial/educational context, and far less than that in the home. In addition, most uses (apart from gaming), utilize only a paltry percentage of the total systems resources, such that even during use, the computer may as well be idle.
Any observant business man will tell you that inefficiencies are a great economic opportunity, in this particular case the potential economic benefits are off the scale. Should computational resources be utilized to their full potential the world would experience a seven fold increase in computational power, with the associated benefits to all computer using industries. This alone could herald a massive period of global economic growth. It is with this in mind that one should look at the Grid domain. While, in the end, it may not deliver the benefits as just expressed, the simple fact that it could, is just cause for excitement and determined investment in building this daunting technology.
But how can we increase resource utilization? There was another time when resource utilization was held back by inefficient use and lack of a coherent distribution (networking)system. Lets see how the grid got its name.
A common analogy used within the Grid community to explain the technology, is the modern electrical power grid. The advent of electricity alone did not transform the modern industrial landscape as one would expect. Oil and coal generators were expensive to build and maintain and power was inefficient without significant economies of scale. Basically most people could not afford to build their own generators for electricity, and in turn very few electrical appliances were developed. It wasn’t until the power grid made electricity cheaply and universally accessible that electric applications were developed and the modern industrial world was defined. The modern power grid is a technological marvel. It presents itself as essentially a single entity that provides power to millions of devices from a highly heterogeneous set of providers. Consumers vary greatly in electrical requirements, yet most everybody is guaranteed a high quality of service for what they require, at an affordable price. The computational Grid and the power grid face very different technological challenges and the analogy is not intended to reflect an identical situation, however the goals are much the same. The Grid intends to provide computational resources uniformly and universally to all sorts of consumers, from highly heterogeneous resources that cross organizational boundaries. Not only that, but similarly to the power grid, it will have to present economic solutions to encourage its use and ensure a flexible payment system for its large diversity of consumers. Basically creating a commodity market for consumers and producers of computational resources, and most probably realizing untold economies in resource use and cost. Lets be a little bold and propose that this would lead to a sort of computational ecology.
What sort of changes would this Grid bring? Well imagine some organization that periodically runs an intensive business simulation. Instead of making large technology investments, it is able to outsource its processing to a grid, and pay on a per/cpu-cycle basis. Imagine being able to bless any instrument (microscope, robot, etc) with a network connection, with a massive amount of intelligence, or to be able to share virtual spaces for collaborative design of complex systems. These are just a few of the applications that already exist and operate on The Grid. Currently there are no public Grids, however in the future, an internet based Grid would have the potential of massively increasing personal processor use, with the prospect of being paid for your idle time.
The Grid “problem” is not solved however. The technology is still young and there are resource management, infrastructure, and security issues that need to be resolved. Any future Grid is most likely going to be a highly dynamic heterogeneous environment, with producers that vary greatly in quantity and quality of their resources, and pop in and out of the Grid environment. Here are a few questions to summarize the Grid Issues.
How are we to maintain high standards of service when using resources that are scattered across boundaries of organizational control?
How will resources be able to protect themselves from unauthorized or malicious use?
How will appropriate resources be located?
How will we deal with problems of persistent storage in this environment?
A few Grid projects have identified and implemented solutions to these questions. While they all have similar goals, there are a variety of design philosophies apparent in the solution domain. We will study the most prominent of the Grid technologies offered in details later. This will have particular focus on the Condor Grid High Throughput Computing software.
Now lets get technical lets attempt to answer the question that what is a grid ? We will answer this question form the implementation point of view. Exactly what combinations of Distributed Computing facilities will enable the particular solution to qualify as a grid? What are the basic points of difference between the clusters and other similar Distributed Computing environments and the grid ? We will present the “Three Point Checklist” for the grid solutions which is proposed by Ian Foster a prominent researcher in the field of Grid Technology.
The essence of the various definitions of Grids can be captured in a simple checklist,
according to which a Grid is a system that:
coordinates resources that are not subject to centralized control …
(A Grid integrates and coordinates resources and users that live within different control domains—for example, the user’s desktop vs. central computing; different administrative units of the same company; or different companies; and addresses the issues of security, policy, payment, membership, and so forth that arise in these settings. Otherwise, we are dealing with a local management system.)
using standard, open, general-purpose protocols and interfaces…
(A Grid is built from multi-purpose protocols and interfaces that address such fundamental issues as authentication, authorization, resource discovery, and resource access. As I discuss further below, it is important that these protocols and interfaces be
standard and open. Otherwise, we are dealing with an application specific system.)
to deliver nontrivial qualities of service...
(A Grid allows its constituent resources to be used in a coordinated fashion to deliver various qualities of service, relating for example to response time, throughput, availability, and security, and/or co-allocation of multiple resource types to meet complex user demands, so that the utility of the combined system is significantly greater than that of the sum of its parts.)
Of course, the checklist still leaves room for reasonable debate, concerning for example what is meant by “centralized control,” “standard, open, general-purpose protocols,” and “qualities of service.”
System is aimed at facilitating the grid computing environment; it also tries to improve the functioning of the grid computing environment. This is done by trying to customize the distribution of jobs on the grid.
It tries to exploit all the available idle resources on the grid, by migrating jobs from heavily loaded machines to machines which are lightly loaded.
The application also scans the network for hosts, the hosts which want to execute some extra jobs, and regularly updates the host-list.
The system is designed as per the POSIX standards and is compatible with Linux and Unix Operating systems.
The Megh is to be deployed on a Linux / UNIX machine, but the agents (clients) can be deployed on Linux, UNIX, Windows machines. As the communication between them is based on standard Network Calls.
The Megh will listen on a specific port, the agents will connect to this port for the sending of the request strings and also receiving of the host addresses and the command strings.
The Agents (Chataks) to be deployed on the Condor Masters throughout the system will communicate with the Megh and, will take the required actions as command strings specifies.
Assumptions and dependencies
The working of the model scheduler is based upon the Condor HTC software.
The main assumption here is that the Condor System when it is issued the command of condor_reconfig will reparse the configuration files and then reconfigure the Condor Daemons.
The software is heavily dependant on the POSIX implementation of Thereads in Linux and their behaviour.
The time required for the parsing of the new configuration files and the reconfiguration of the Condor should be less.
General Constraints
The hosts on the network shold be on the same LAN
The hosts should be able to communicate with each other and should be behind a firewall
The hosts if behind a firewall should have atleast some specific ports open for communication.
The Software is totally based on GNU/Linux system and is not designed for other platforms
The Scheduling Algorithms deployed should not hve real time components.
The Performance tests should take in considerations the total time required for job queue completion in case of simple jobs.
The software is degined for the jobs based on standard C liabrary, the Flocking (Migration) of MPI jobs is not supported by the Condor HTC software.
Proposed Architecture of a Grid Scheduler
We will first examine in detail the architecture which is proposed for the grid scheduler. And then see how the Scheduler we have developed fulfills the requirements laid down by the proposed model.
Following is presented a general architecture for scheduling on a Grid. A Grid scheduler (or broker) must make resource selection decisions in an environment where it has no control over the local resources, the resources are distributed, and information about the systems is often limited or dated. The Grid scheduling architecture has three phases: resource discovery, system selection, and job execution. step. We note that no current Grid scheduler implements all of the steps of this architecture.

Single sites are simply no longer efficient for meeting the resource needs of high-end applications, and using distributed resources can give the application many benefits. Effective Grid computing is possible, however, only if the resources are scheduled well. Grid scheduling is defined as the process of making scheduling decisions involving resources over multiple administrative domains. This process can include searching multiple administrative domains to use a single machine or scheduling a single job to use multiple resources at a single site or multiple sites.
Here we do not address the situation of speculative execution submitting a job to multiple resources and, when one begins to run, canceling the other submissions. We do, however, discuss resource selection (sometimes termed resource discovery , assignment of application tasks to those resources (mapping ), and data staging or distribution.
One of the primary differences between a Grid scheduler and a local resource scheduler is that the Grid scheduler does not own the local resources and therefore does not have control over them. The Grid scheduler must make best-effort decisions and then submit the job to the resources selected, generally as the user.
Here the user is not supposed to be root but some other user with limited privileges. The job submission by user root is not allowed due to security reasons. Here in our model of the Grid Scheduler we do not make efforts to submit the jobs, as the jobs may vary in the language they are coded in and the issue of maintaining logs and the intermediate data at one site cannot be handled efficiently by the Scheduler.
Furthermore, the Grid scheduler does not have control over the set of jobs submitted to it, or even know about the jobs being sent to the resources it is considering use of, so decisions that tradeoff one job's access for another s cannot be made in the global sense. This lack of ownership and control is the source of many of the problems to be solved in this area.
This model Grid Scheduler does the application level scheduling, while Condor does the job level scheduling for the Grid.
We will now map the architecture of the Grid Scheduling Algorithm test bed on to the proposed architecture of the Grid Scheduler. We will follow the step by step procedure of the Grid Scheduling and then describe in details how the Test-Bed Scheduler performs these steps.
The architecture of the Model Grid Scheduler is multithreaded. To minimize the cost of communication between different parts of the same entity, the design based on threads was selected. The different functions of the Grid Scheduler have been modularized into thread functions and implemented.
The Architecture of The CMs System.
The CMs is basically Agent based Scheduler, hence it is implemented in two parts which are named as Megh and Chatak.
Following are the details of their functions and the constituent threads:
Megh (Daemon Process)
Inter Megh Communication.
Free Host Handler Thread.
Requestor Host Handler Thread.
Scheduler Thread (Match Maker).
History Thread.
Chatak (Daemon Process)
Megh Searcher Thread.
Host Reporting Thread.
Host Requestor Thread.
CPU Idle time calculating Thread.
Reconfiguration and Rescheduler Thread.
History Thread.
As we can observe in the above figure the Chatak daemon process runs on the Condor Masters. And the Megh daemon can be located any where on the network. The daemons in the architecture represent the actual work to be carried out by the respective parts of the scheduler.
Following are the functional details of each of the threads the Scheduler implements.
The Scheduler Parameters
For Requesting Hosts
char req_host_str[256];
char *free_hosts_ptr[256];
int hosts_taken;
long int rank;
int nice_val;
long int req_queue_len;
int time_fact1;
int time_fact2;
int stability_fact;
For Free Hosts
char free_host_str[256];
char given_to_host[256];
int hosts_serviced;
long int rank;
int nice_val;
unsigned long int cpu_kflops;
unsigned long int cpu_mips;
unsigned long int memory;
double avg_cpu_idle;
int stability_fact;
Chatak (Daemon Process)
First of all the Chatak is installed on the interested Condor Masters, It runs with the root privileges. Basically it can run as any user who has sufficient permissions to change the Condor configuration files and issue commands to the underlying Condor Master. As it communicates on ports > 10000 it can run as any user.
When the Chatak daemon starts up it first senses how many jobs does the underlying Condor Master has got waiting. If is queue value is greater than certain threshold value, the Chatak contacts the Megh for requesting of more hosts to run jobs on. Otherwise the host here is reported free and the corresponding values of the host are entered in the database of Megh. The switch here is made on the queue value solely because, the value of CPU busy can be very misleading and incorrect. If the request for the machines is not fulfilled by the Megh , Chatak retreats and tries again after a short interval of time. If the request is successful the address of the extra host got is given to the reconfiguration thread to rewrite the configuration files and reschedule the jobs on that Condor Master.
There was a choice here presented to use either some of the readymade tools for the reporting of the host status, the NWS (Network Weather Service ) or making the utilities from scratch. Here the host properties are found out and reported by a mechanism written from scratch to have greater control over the details of the implementation. This mechanism is embedded inside of the Chatak to report the various scheduling parameters to the Megh daemon. The Megh is the place where the scheduling decisions are made and then executed, it also maintains the history of the decisions made for a more critical and minute view of the scheduling algorithm involved.
Megh (Daemon Process)
The Megh is the other part of the agent based Scheduler CMs. When the Chataks on various host machines wake up they first of all check whether the underlying Condor Master is OK or not and then check on the queue length status. They depending upon the various configurable switching parameters either report their status to the Megh , about their free properties . Or they report the status of the queue lengths and then ask for more Flockable hosts from the Megh.
The architecture for Megh is multithreaded and based on the POSIX Threads implementation. It uses different threads to handle the type of requests coming in. For example, Free Host Handler thread will handle the free host reports coming in . The Requesting Host handler thread will handle the requests for the hosts coming in. The scheduler thread will handle the scheduling and the history keeping part. The History keeping part may be implemented as a separate thread itself.
An improvement to the CMs architecture would be to add a thread which will handle the inter Megh communications. This will facilitate the exchange of scheduling information between different Meghs spread over the network. Making this model scheduler more pronounced as a Multi Agent based scheduler. And also will be a step further in the direction of eliminating the Megh as a single point of failure, by replicating the database throughout the network.
Now Lets try to map the proposed Architecture on to our Chatak-Megh Samvad.
Phase 1: Resource Discovery
The first stage in any scheduling interaction involves determining which resources are available to a given user. The resource discovery phase involves selecting a set of resources to be investigated in more detail in Phase 2. At the beginning of Phase 1, the potential set of resources is the empty set; at the end of this phase, the potential of resources is some set that has passed a minimal feasibility requirement. Resource discovery is done in three steps: authorization filtering, job requirement definition, and filtering to meet the minimal job requirements.
The first step of resource discovery in job scheduling is to determine the set of resources that the user submitting the job has access to. In this regard, computing over the Grid is no different from remotely submitting a job to a single site: without authorization to run on a resource the job will not run. At the end of this step the user will have a list of machines or resources to which he or she has access. The main difference that Grid computing lends to this problem is sheer numbers. It is now easier to get access to more resources, although equally difficult to keep track of them. So we have designed Chataks to report more frequently to the Megh to keep the information updated.
This First Phase is handled by the Chataks installed on each machine. They obtain information from various sources such as the /proc file system , condor_q , condor_status command. They process this information and forward it to the Megh Daemon in the required format. These are the actual scheduler parameters to be used for the allocation of free machines to the requesting hosts .The three steps are done by the Chatak Damon as follows :
Authorization filtering
Chatak runs as user root and the Condor runs as the user Condor, Jobs execute as the user nobody. This is how the various authentication filtering is handled. It is done internally via the UNIX file permissions and user authentication procedures. Condor does not require an account (login) on machines where it runs a job. Rather, it uses remote system call technology, which traps library calls for such operations as reading or writing from disk files. The calls are then transmitted over the network to be performed on the machine where the job was submitted.
Job requirement definitions
Each job submitted will have to have a .cmd file which contains the nunaces of that job. You can suppy the additional requrements by using the syntax of the job submission files
ex :
Requirements = (OpSys == "IRIX65" && Arch =="SGI") ||
(OpSys == "LINUX" && Arch =="INTEL")
or
Requirements = Name=="infn-corsiLL.corsi.infn.it" || \
Name=="infn-corsiRR.corsi.infn.it"
Minimum Requirement Filtering
Filtering to meet minimum job requirements.
This is handled by the Condor Schedd at the lower level and the application level is handled by the CMs Software. Whence the minimum job requirements of single job are fulfilled the Condor Master on that machine will allow that job to run. Condor also does an initial matching (using ClassAds and the matchmaker) and then a feasibility evaluation upon claiming the actual resources. At the application level the Chatak will sense the job queue length (or any other set parameter ) to satisfy the minimum requirements of the job queue on that Condor Master, and enable the queue to finish more time efficiently, by providing with extra machines. We maintain that as systems grow, this stage will be an important one for continued scalability of other Grid-level schedulers.
Phase : 2 System Selection
Given a group of possible resources (or a group of possible resource sets), all of which meet the minimum requirements for the job, a single resource (or single resource set) must be selected on which to schedule the job. This selection is generally done in two steps: gathering detailed information and making a decision. We discuss these two steps separately, but they are inherently intertwined, as the decision process depends on the available information. This process is done jointly by the Megh and Chataks running on separate machines. Following are the details.
Dynamic Information Gathering
In order to make the best possible job/resource match, detailed dynamic information about the resources is needed. This data is collected at configurable intervals. The basic Condor approach generates the information on each machine, sends this data to the central manager, and updates the dynamic information every five minutes. The Chatak on each system will also in a similar fashion report the status of the resource to the Megh for the updating of database. The data provided is
For the Free Hosts
Free host Internet address
The CPU capacity in KFLOPS
The CPU capacity in MIPS
The Total Physical Memory available.
The time for which the CPU is idle on average.
A Stability factor based on the average time for which the host remains free.
For the Requesting or Overloaded Hosts
Requesting host Internet address
The Queue Length on that host.
Some Time Factors based on algorithm specifications.
A Stability factor based on the average time for which the host remains up.
System Selection
With the detailed information gathered in Step 4, the next step is to decide which resource (or set of resources) to use. Various approaches are possible. The most commonly used system for matching application requirements to resources is the Condor Matchmaker/ClassAd system. A ClassAd can contain an expression that evaluates how well it matches to the corresponding ClassAds. This approach is taken at the job level or the lower level by the Condor System. At higher level a specific policy based on the algorithm to be implemented can by specified. This will dictate the manner in which the CMs software will allocate hosts to the requesting machines.
The Megh then based on the applied scheduling policy will issue a host Internet address to the requesting machine to enable the migration of jobs to the target machine. The Chatak on the requesting machine is then responsible for the receiving of this address and the reconfiguration of the underlying Condor Master and rescheduling of the jobs on it. This should allow more flexibility and a better scaling effect as the number of local resources considered grows.
Phase 3: Job Execution
The third phase of Grid scheduling is running a job. This involves a number of steps, few of which have been defined in a uniform way between resources. The CMs system, will only issue suggestions to the requesting Condor Master via the Chatak deployed on each of them. This will be done via the changing of the FLOCK_TO field in the configuration file of the respective Condor Master. The lower level details of job scheduling and actual process migration are handled by the Condor Masters. Following is the detailed description of how the Condor and CMs goes by the matching of jobs and resources, and the execution.
Advance Reservation
The Advanced Reservation step is handled jointly by the Megh Scheduler thread and the Chatak Requesting Host Reconfiguration thread with the Condor System. This is done by the Scheduler thread which gives away the address of the Free host to the requesting host. And then the respective receiving Chatak will get this address and then reconfigure the underlying Condor Master according to the data received.
Job Submission
The jobs are submitted on the submitting machine by the user via the condor_submit command. The user can submit any no of jobs from the submitting host he does this by the condor_submit command. These jobs can being to the various universes defined by Condor .
Preparation Tasks
Preparation Tasks are handled jointly by the Condor System and the Chatak Requesting Host Handler Thread.
The remaining three steps are carried out by the Condor System based on the suggestions given to it by the CMs software.
Monitoring Progress
Job Completion
Cleanup Tasks
Following figure shows the intricate details of job handling by Condor.

Startd sends collector ClassAd describing itself. (The Schedd does as well, but it has nothing interesting to say yet.)
The user calls condor_submit to submit a job. The job is handed off to the schedd and condor_submit returns.
The schedd alerts the collector that it now has a job waiting.
The negotiator asks the collector for a list machines able to run jobs and schedd queues with waiting jobs.
The negotiator contacts the schedd to learn about the waiting job.
The negotiator matches the waiting job with the waiting machine.
The negotiator alerts the schedd and the startd that there is a match.
The schedd contacts the startd to claim the match.
The schedd starts a shadow to monitor the job.
The startd starts a starter to start the job.
The starter and the shadow contact each other.
The starter starts the job.
If the job is using the Condor syscall library (typically through being condor_compiled), it will contact the shadow to access necessary files.

Use Case Diagram
Analysis Classes
Activity Diagram
Sequence Diagram
Collaboration Diagram
Class Diagram
State chart Diagram









Softwares Used
Redhat Linux CERN Version (7.3)
Redhat Linux 9
Debian Sarge
Condor HTC Software (6.6.8)
MPICH Library 1.2.5 for MPI World Program
Ssh for data transfer betweens two nodes.
Mpicc to compile MPI programs.
condor_complie for compiling C Language Program.
VIM editor to write the program.
Anjuta for the compilation and managment of Project
Gnuplot to plot the graph of the tests conducted.
EtherApe for the display of grid activity
Lex Lexical Analyzer generator.
Kate Advanced Text editor.
Coding Standards
We have followed Hungarian Notation for coding. These are obtained from MSDN, and different Web Sites. Coding standard can be obtained from Coding Conventions Documents for our project. Coding style followed is Linux based as most of GNU/Linux in-built programs or standard book provide.
The indentation style followed is GNU/Linux style.
The coding policy is to minimize the hard coded part and to query for information wherever possible.
Testing and benchmarking is the most important phase in the life cycle of a project as it provides a mechanism to find if the goals of the project have been met to satisfaction. The project was subjected to testing during its construction as well as after completion.
Machines Boot up
Condor Masters (Condor Software) on all machines is started
Initial benchmarking and internal performance analysis is done by the software
User submits jobs on the submit machine
The Condor begins negotiation for jobs from the Condor Central Manager.
The Chatak comes up on the client machine.
The Megh comes up on the machine running the Megh daemon
Chatak depending on some fixed switch either reports the underlying host as free or as requests the Megh for more machines by reporting the host as “Requesting Host”
Megh accepts the Host report either as “Free” or “Requesting”.
Megh Scheduler thread then matches the Free hosts with the Requesting hosts and then gives the Requesting hosts the addresss of Free hosts for flocking.
Chatak accepts the addresses and reconfigures the underlying Condor
Chatak after the reconfiguration is complete , reschedules to jobs on the underlying machine.
The jobs now migrate from the Requesting host to the Free hosts and start executing there.
Machine Descriptions
Machine 1:
Hostname: vishwamitra.it.mitp
IP Address: 10.1.30.99
CPU: Pentium 4 (1.4 GHz)
Memory (RAM): 256 MB
Hard disk: 30 GB
Network Card: 100 MBps
Condor Version: 6.6.8
Operating System: Red Hat Linux EL 3 Enterprise Edition
Physical Location: Internet Center control room
Machine 2:
Hostname: shandilya.it.mitp
IP Address: 10.1.30.105
CPU: Pentium 4 (1.4 GHz)
Memory (RAM): 256 MB
Hard disk: 30 GB
Network Card: 100 MBps
Condor Version: 6.6.8
Operating System: CERN Edition GNU/Linux 7.3
Physical Location: Software Lab 4
Machine 3:
Hostname: ganga.it.mitp
IP Address: 10.1.30.206
CPU: Pentium 4 (1.4 GHz)
Memory (RAM): 256 MB
Hard disk: 30 GB
Network Card: 100 MBps
Condor Version: 6.6.8
Operating System: Fedora Project GNU/Linux Core 3
Physical Location: Software Lab 4
Machine 4:
Hostname: vayu.it.mitp
IP Address: 10.1.30.4
CPU: Pentium 4 (1.4 GHz)
Memory (RAM): 256 MB
Hard disk: 30 GB
Network Card: 100 MBps
Condor Version: 6.6.8
Operating System: Red Hat GNU/Linux 9
Physical Location: Software Lab 2,3 Control Room
Machine 5:
Hostname: minix.it.mitp
IP Address: 10.1.30.25
CPU: Pentium Mobile 4 (2.13 GHz)
Memory (RAM): 256 MB
Hard disk: 30 GB
Network Card: 100 MBps
Condor Version: 6.6.8
Operating System: Debian GNU/Linux Sarge
Physical Location: Personal Laptop
Network Description
Ethernet 100Mbps, UTP CAT 5 Cabling
The tests consists of the following jobs to be submitted to Condor:
host_job.sh
This is a simple shell script which will execute on the target machine. It will sleep for some amount of time (passed as command line argument), and then terminate. It will for the purpose of establishing the identity of the machine it is executing upon will print the name of the host it is executing upon and the process id it has got. It will additionally also describe the command line arguments passed to it, and the exact location of the binary on the executing machine.
pi_calc.c
This program is written in C and will calculate the value of Pi to the no of places given as a command line argument. This is a typical example of long running job. This job is included in the test suite to test if the information is lost on the disturbance of the executing machine. The no of places supplied must be the power of 2. This job prints the Pi value thus calculated on to the stdout, which is redirected to a file pi_calc.out.
The rest of the programs are of Monte Carlo simulations, these form the typical problems executed on the grid for High Energy Physics experiments.
The Monte Carlo technique is a very flexible method for simulating light propagation in tissue. The simulation is based on the random walks that photons make as they travel through tissue, which are chosen by statistically sampling the probability distributions for step size and angular deflection per scattering event. After propagating many photons, the net distribution of all the photon paths yields an accurate approximation to reality.
tiny_mc.c
Simulates light propagation from a point source in an infinite medium with isotropic scattering. The entire source for this program fits on a single page and is a good way to get an overview of the entire Monte Carlo process.
small_mc.c
Simulates light propagation from normal irradiation of a semi-infinite medium with anisotropic scattering. It calculates the volumetric heating as a function of depth. This program takes two whole pages, but is very handy for a bunch of problems.
time_mc.c
Simulates the time resolved backscattering of a semi-infinite medium with anisotropic scattering. This program is adapted from small_mc.c above, and shows how simply time resolved simulations can be done.
cvh_pass.c
The last program of the suite is a an implementation of the brute force password cracking method on the grid. It uses the brute force method to guess a password. It also displays the time taken for it to complete. For benchmarking purposes, this ouput of the program can be used. This is also a typical example of long running jobs submitted to the Grid for solutions. It can also be used constructively to test the strength and quality of the password chosen.
The system was subject to tests wherein the queue completion time was taken in consideration. Here we are not concentrating on the completion time of a single job. We have formed queues of the jobs described above. These queues are submitted fromt eh submitting machine and their completion is monitored.
The time taken for the completion of each queue is reported. The whole bunch of queus of 6 jobs in all is submitted atleast 3 times . This is done to get a fair approximation of the completion time of the queue on the test machine.
The Scheduling algorithms are tested on identical machines and identical network environment. The same jobs are submitted in the same way for each of the scheduling algorithm to be implemented. The improvement or the degradation in the completion time of the queues give the fair idea of how the scheduling algorithm is fairing.
The following is the description of the queues used in the evaluation :
host_job: 50 Jobs in a queue
pi_calc.c: 50 Jobs in a queue
small_mc.c: 50 Jobs in a queue
tiny_mc.c: 50 Jobs in a queue
time_mc.c: 50 Jobs in a queue
cvh_pass.c: 50 Jobs in a queue
Following are the results of the test conducted to calculate the Job Queue Completion on a single host without the software installed.
|
Job Queues Submitted |
Total Time ( %R ) |
CPU ( %U ) |
CPU ( %S ) |
CPU (%U + %S) / %R |
|
host_job |
375.195 |
253.57 |
54.55 |
81.12 |
|
pi_calc |
3516.684 |
747.04 |
109.31 |
24.35 |
|
tiny_mc |
672.941 |
281.29 |
37.81 |
47.51 |
|
small_mc |
578.197 |
87.43 |
15.78 |
17.85 |
|
time_mc |
1877.33 |
453.95 |
66.13 |
27.7 |
|
cvh_pass |
229.253 |
45.81 |
9.08 |
23.94 |
|
host_job |
775.052 |
547.13 |
88.87 |
82.05 |
|
pi_calc |
3468.373 |
698.64 |
105.46 |
23.18 |
|
tiny_mc |
675.215 |
282.15 |
38.1 |
47.42 |
|
small_mc |
583.393 |
87.75 |
16.74 |
17.91 |
|
time_mc |
1876.797 |
453.9 |
65.67 |
27.68 |
|
cvh_pass |
227.98 |
45.54 |
8.68 |
23.78 |
|
host_job |
385.63 |
258.43 |
56.45 |
81.65 |
|
pi_calc |
3351.926 |
603.94 |
95.49 |
20.86 |
|
tiny_mc |
676.318 |
280.47 |
37.34 |
46.99 |
|
small_mc |
581.897 |
85.77 |
16.71 |
17.61 |
|
time_mc |
1986.585 |
519.17 |
72.69 |
29.79 |
|
cvh_pass |
226.954 |
43.35 |
8.37 |
22.78 |
|
host_job |
771.888 |
540.65 |
90.46 |
81.76 |
|
pi_calc |
3054.665 |
387.61 |
70.58 |
14.99 |
|
tiny_mc |
379.045 |
63.64 |
12.79 |
20.16 |
|
small_mc |
583.558 |
86.15 |
16.61 |
17.6 |
|
time_mc |
1859.18 |
424.17 |
65.04 |
26.31 |
|
cvh_pass |
231.45 |
48.39 |
9.17 |
24.86 |
|
host_job |
705.914 |
489.32 |
84.84 |
81.33 |
|
pi_calc |
3046.347 |
385.24 |
68.68 |
14.9 |
|
tiny_mc |
674.19 |
279.09 |
37.88 |
47.01 |
|
small_mc |
581.108 |
85.63 |
16.35 |
17.54 |
|
time_mc |
1547.617 |
202.12 |
38.44 |
15.54 |
|
cvh_pass |
254.142 |
56.46 |
10.85 |
27.45 |
|
host_job |
678.462 |
472.3 |
80.02 |
81.4 |
|
pi_calc |
3508.346 |
715.79 |
109.84 |
23.53 |
|
tiny_mc |
679.241 |
283.33 |
38.49 |
47.37 |
|
small_mc |
581.797 |
87.39 |
17.26 |
17.98 |
|
time_mc |
1556.265 |
204.87 |
39.52 |
15.7 |
|
cvh_pass |
233.873 |
47.92 |
9.38 |
24.5 |

Further tests are expected to show that the job queue times decreases steadily as more machines are acquired. This can be proved when there are at least 5+ machines for testing, as the scheduling algorithm implemented should favor some machines and penalize others. The job queue timings of these two machines should be always less that that of the single machine but should be different from reach other.
A scheduling algorithm will be proved to be better is it successfully demonstrates the decrease in queue times of test machines according to the scheduling algorithm specified
The project holds a great promise for application in the fields of distributed computing in general and Grid Computing in particular. The CMs project in some ways tries to emulate the grid on a small scale. The scale referred here is with respect to the manageability of the grid size. The main purpose of the CMs Project is to facilitate the study of the Grid Scheduling algorithms and aid in their design and analysis. It does this without costing the Researcher much hardship.
The objective of computational grids is to coordinate shared, distributed heterogeneous resources to work as a single computational resource. The availability of such an environment opens new horizons for research in areas previously unexplored or limited for economic or impractical reasons. The existence of long distance, low cost, high performance networks is encouraging the development of applications which take advantage of geographically dispersed resources. However, relatively few applications exploit the computational power available from such environments efficiently. Due to the diversity of resources, their dynamic behavior and the instability generally encountered in grids, developing applications capable of executing efficiently in such environments is still a challenge.
Users of parallel system frequently complain of the difficulty of achieving a reasonable fraction of the theoretical peak performance from the systems they use. In environments like grids with so much variation, it is extremely difficult for users (typically scientist and engineers with little system knowledge) to decide, for each application and for each execution, which resources would be the most appropriate. The challenge is to execute applications efficiently and robustly in grid environments, without placing this burden on the programmer or the user.
It is based on Condor as a base. So it is free from the low level problems to be solved.
The Researcher can at his will change the scheduling algorithm.
The Researcher can specify various scheduling parameters to be included or excluded from the scheduler.
The Researcher can decide on exactly what basis the hosts are decided to be free or busy, i.e. he can decide the thresholds very comfortably.
The Condor HTC software is Free for download and is available in various flavors for various platforms.
The nuances of the platform are handled well in Condor, this is the principle advantage of the CMs software not getting involved in the intricacies of job executions.
The design of the CMs system is based on the Agent Based Scheduler, so it incorporates all the positive points of this technique.
Future development of the system may lead to the development of the system such as a Tool for the Design and Evaluation of Hybrid Scheduling Algorithms for Computational Grid. We have studied a case of such implementations, which will impress the point that this system in future may be able to serve practical needs of scientists.
Free state of vishwamitra.it.mitp

Free state of minix.it.mitp

Free state of ganga.it.mitp

Free state of vayu.it.mitp

Unavailable state of shandilya.it.mitp

Example Job to be submitted (host_job.sh)

Example Job description file

Job submission

Jobs not running on shandilya.it.mitp

After reconfiguration by CMs Job run on remote Free Machines

Monitoring of Machine Status

Research in Grid Computing is progressing at an enormous pace. New ideas are coming up at every hour. Various Research Labs are at cut throat competition for supremacy and total control over the field. This project was conceived with an aim of helping the researches test out the various scheduling strategies without worrying about the lower level implementation details. The main aim of the project is to bridge the gap between theory and practice.
The working environment of the Grid Computing being different from the turf of the traditional Scheduling algorithms, simulating this is a challenge. The heterogeneity and the highly dynamic behavior of the grid computing demands a new class of scheduling strategies. And for their testing development of a new breed of testbeds. The project aims at proving that the simulation for a grid and implementation of user specific scheduling policy on it is possible without delving into minute details of Job level scheduling.
This software working at the application level tries to implement a user specified scheduling policy onto the underlying Grid. There being only one or two parallels of this kind of work testifies the challenges face by the Grid Resource Management deployment and testing.
The software touching on various parts of Computer Engineering such as Computer Networks, Compiler Construction, Operating Systems and Distributed Computing was a great learning experience and a though being a single person effort involved the contributions of many. Though the software may appeal to many as wanting in many areas, but is a step in the right direction.
A Report for the NSA LUCITE Task Order Productive Use of Distributed Reconfigurable Computing February 21, 2001
Tarek El-Ghazawi et al
What’s all the Fuss About? (IEEE IT Professional March-April 2004)
Grid Computing 101
SETI@Home–Massively Distributed Computing For SETI.
Eric Kaplan, Dan Werthimer, David Anderson, Jeff Cobb and Matt
Lebofsky
Profiling Workstations’ Available Capacity for Remote Execution.
Matt Mutka and Miron Livny.
What is Grid ? a Three Point Checklist.
Ian Foster
On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing.
Ian Foster and Adriana Iamnitchi
CondorGrid Computing from Mobile Handheld Devices.
Fransisco J. , Javier V. , Miron L. , Enrique C. and Luis A.
Resource Discovery in Distributed Networks.
Mor Harchol-Balter, Tom Leighton and Daniel Lewin.
A General Architecture for Scheduling on the Grid(Argonne National Laboratory reprint 2001)
Jennifer M. Schopf
Condor Version 6.6.8 Manual (April 15, 2004)
Condor Team, University of Wisconsin,Madison:
Scheduling Remote Processing Capacity In A Workstation-Processor Bank Network
Matt Mutka and Miron Livny
Automated Scheduling, Optimisation and Planning (ASAP) Research Group Report (2001 2002)
Dr Jon Garibaldi
Open Issues in Grid Scheduling Report of the workshop held at the e-Science Institute (October 21-22 2003)
Agent-based Resource Management for Grid Computing
Junwei Cao et al
-