Grid Scheduling Algorithms Tesbed

Acknowledgement


I would like to express my sincere gratitude for the assistance and support of a number of people who made this project a success.


I am very thankful to Prof. Mrs. V.Y. Kulkarni, Department of Computer Engineering, Maharashtra Institute of Technology, my internal guide for her valuable technical guidance and suggestions that she provided us at various stages throughout the project work.


I am also thankful to Prof. Mr. A.K. Pathak, Head of Department of Computer Engineering, Maharashtra Institute of Technology, for encouraging me at all times and giving important suggestions.


I would like to thank Prof. Dr. R.K. Shaymasundar (Dean STCS, TIFR) for allowing me to undertake the project at STCS, TIFR. I would like to thank Dr. Basant Rajan (Veritas) for helping me with the initial idea of the project, Mr. Anbalagan S. (ex TIFR) for the initial guidance regarding the project work , Mr. Sachin Wasnik (ex TIFR), Mr. S. Madhu (Scientific Officer, STCS, TIFR) for the very important guidance and support. I would like to thank Mr. N. Kaushik Kumar (Project Assistant STCS, TIFR) for his valuable guidance and support throughout the project work at TIFR.


My special thanks to the non-teaching staff of Maharashtra Institute of Technology, who provided us access to the lab facilities from time to time.


Abstract



The Big Bang, the only physical phenomenon that can be used as a yardstick to measure the explosive increase in the popularity of the Internet. Also the ensuing data explosion is of magnitude of the same order. An individual datum when in isolation and not operated upon presents no value at all. To use the datum we need to process it, the processing of such enormous data (as of now easily available because of the Internet) is the solution offered by the Grid Computing. Grid Computing is aiming to provide with enormous computing power at the hands of the users, for various needs. The tremendous increase in the data storage capacities, processing facilities and information carrying capabilities of the interconnecting networks make it the computing platform of the future.


There is a phenomenal increase in the interest for research in the field of Grid Computing. The core areas being Security, Resource Scheduling, Data Storage, Data Sharing, Accountability. Of these core areas the most important is the “Resource Scheduling” for dynamic grid environments. There is enormous amount of research going on in this particular branch of Grid Computing. But the bane of scientists is the lack of tools or testbeds for the evaluation of policies or algorithms they design.


This project is aimed to fulfill the gap between the discovery and implementation of more efficient scheduling strategies for the Grid. The Grid environment being fundamentally different from the environment of the traditional schedulers, specialized softwares are required to correctly simulate it. This project aims at providing with such a tool for the evaluation of different Grid Scheduling algorithms.




Chapter 1 Introduction



One of the objectives of computational grids is to offer applications the collective computational power of distributed but typically shared heterogeneous resources. Unfortunately, efficiently harnessing the performance potential of such systems (i.e. how and where applications should execute on the grid) is a challenging endeavor due principally to the distributed, shared and heterogeneous nature of the resources involved. This project tries to propose a tool to aid the design and evaluation of scheduling policies suitable for efficient execution of parallel applications on computational grids.


  1. Motivation for research, system development


The objective of computational grids is to coordinate shared, distributed heterogeneous resources to work as a single computational resource. The availability of such an environment opens new horizons for research in areas previously unexplored or limited for economic or impractical reasons. The existence of long distance, low cost, high performance networks is encouraging the development of applications which take advantage of geographically dispersed resources. However, relatively few applications exploit the computational power available from such environments efficiently. Due to the diversity of resources, their dynamic behavior and the instability generally encountered in grids, developing applications capable of executing efficiently in such environments is still a challenge. Users of parallel system frequently complain of the difficulty of achieving a reasonable fraction of the theoretical peak performance from the systems they use. In environments like grids with so much variation, it is extremely difficult for users (typically scientist and engineers with little system knowledge) to decide, for each application and for each execution, which resources would be the most appropriate. The challenge is to execute applications efficiently and robustly in grid environments, without placing this burden on the programmer or the user.

  1. The Problem


While there have been several proposals of high performance global computing systems, scheduling schemes for the systems have not been well investigated. The reason is difficulties of evaluation by large-scale benchmarks with reproducible results. This Chatak Megh Samvad (CMs hereafter) performance evaluation system would allow analysis and comparison of various scheduling schemes on a typical high-performance global computing setting. CMs can simulate various behaviors of global computing systems, especially the behavior of networks and resource scheduling algorithms. The behavior of the network and the resource Scheduling Algorithms is faithfully carried out by the Condor HTC software. The scheduling algorithm used is the UP DOWN Scheduling Algorithm for the resource scheduling.


High performance global computing systems fueled by the rapid progress of high-speed networks and computing resources are now regarded as the computing platform of the future. In order to effectively employ computing resources therein, most proposed global computing systems embody a resource scheduling framework whose components monitor the global computing environment and predict availability of the resources. For effective investigation and objective comparison of scheduling algorithms and the implementation of the scheduling frameworks, large-scale benchmarks with reproducible results under various environments parameterized by the following constituents over time are required:


Servers --- architecture, performance, load and variance.


  1. The Solution


These parameters have been included in the scheduling scheme, to be detailed later in the architecture of the CMs project. These factors of machines influence the scheduling decisions made by the CMs scheduler.

However, reproducibility over a wide-area network is extremely costly to achieve, if not impossible. Thus, currently it is unrealistic to compare the different scheduling

algorithms proposed by other researchers, let alone compare the systems themselves. Cost and scale of possible benchmarks are also extremely limited. The resulting lack

of impartial comparative approaches is a great hindrance to global computing research and deployment. In order to resolve this situation, we are building a performance evaluation system that would allow analysis and comparison of various global computing systems under reproducible, controlled environments, called CMs. The current version of CMs mainly focuses on the evaluation of different scheduling algorithms and schemes based on a canonical (orthodox) model of high-throughput global computing system. The CMs Project tries to emulate the Grid as far as it can and give the researcher the results of the tests reliably aping the way scheduling scheme would work in a real world grid computing environment. The main aim is to prove that a scheduling scheme can be implemented in such a way, trying to divert the job level scheduling to a underlying software such as Condor and retains the application level scheduling.




Project Plan


  1. Project Problem Definition


The aim of this project is to facilitate the working of the grid by providing a facility to the Grid. That is an agent based scheduler and the other is to enable the grid to work across firewalls in an efficient and secure manner. The project is to design a test bed for the implementation of Grid Scheduling Algorithms. The primary requirement here is a “manageable grid”. Manageable in the sense that the expense of setting the grid up and routine maintenance is to be very less. Generally in the industrial environment the Grid spans multiple departments and is very complex and hard to maintain owning to the various factors involved in the proper functioning of it. So for this project I have chosen a High Throughput Computing facilitator “Condor”. It is non production environment software. What exactly we are exploiting here is, first the ability of Condor to migrate jobs from overloaded machines to less loaded ones using the mechanism known as “Flocking”. Secondly the ability of condor to get reconfigured in the runtime so as to facilitate the dynamic provision of less loaded machine in the grid. This basically will be a Long term scheduler for the grid.


  1. Objective of Project


The main objective of the project is to, build a grid scheduling algorithm test bed. The test bed is to provide the following features for the grid:



The working and the architecture of the Test Bed is covered in depth in the System Architecture part of this report.


  1. Project Scope


    1. Technical Feasibility


The project aims at facilitating the grid computing environment by providing with an agent based scheduler for the proper and efficient distributing of machines. The most primary assumption of the project is based on a very important facility provided by the Condor HTC software, i.e. Flocking between various masters on the grid. The proof of concept has been successfully established that the jobs do migrate from the overloaded master to the more lightly loaded one. This is done by setting various parameters of the Condor on the participating machines.


The project basically aims at automating the process of flock configuration based on some predetermined scheduling algorithm. Thus providing a test bed for the execution of grid scheduling algorithms. The construction of a grid based on the Condor HTC components is the most efficient with respect to time factor and requirements, amongst the various options considered (Globus, OSCAR, ROCKS).


This states the project is technically feasible.


    1. Economical Feasibility


The Condor HTC Project is basically Open Software and tough the code is not released yet. The various flavors of Condor are available for free download from the Project website www.cs.wisc.edu/condor/downloads. It can run various types of jobs categorized into different Worlds (Vanilla (Simple Jobs), MPI (Message Passing Interface), PVM (Parallel Virtual Machine)).

The Condor software is available of no charge for various operating systems. The machine requirements are minimum and no need of special hardware or special interconnect devices is needed. The whole grid is based on Linux Platform and hence the licensing issues are dealt with the Free Software Foundation Standards, there is no need of proprietary software to use with the system.


The project is economically feasible in all respects.



    1. Legal Feasibility


The platform used for the project is Redhat Linux 7.3 (CERN), Redhat 9 and Debian Sarge. All these software are available under the GNU GPL (General Public License; described in the User’s Manual). And the Condor software is available free for Academic use, under its own special license (Provided in the Licenses Section.). The programs are also developed and compiled fully under the Debian Operating System.


Thus the project has fully satisfied its license requirements, so the project is legally feasible.


  1. Project Resources

  1. Human Resources

The team of 1 person is working on this project


  1. Chaitanya Hazarey


  1. Hardware Resources

  1. Minimum 3 Pentium-IV 1+ GHz

  2. 100Mbps fast Ethernet.

  3. Minimum 128MB RAM

  4. Minimum 30 GB HDD


  1. Software Resources

  1. Redhat Linux CERN Version (7.3)

  2. Redhat Linux 9

  3. Debian Sarge

  4. Condor HTC Software (6.6.8)

  5. MPICH Library 1.2.5 for MPI World Program

  6. Ssh for data transfer betweens two nodes.

  7. Mpicc to compile MPI programs.

  8. condor_complie for compiling C Language Program.

  9. VIM editor to write the program.

  10. Anjuta for the compilation and managment of Project

  11. Gnuplot to plot the graph of the tests conducted.

  12. EtherApe for the display of grid activity

  13. Lex Lexical Analyzer generator.

  14. Kate Advanced Text editor.


  1. Project Estimates


This section provides cost, effort, and time estimates for the project of building the Grid Scheduling Algorithms Testbed.


Historical Data used for Estimation

Grid Computing is an emerging field. Most of the work carried out in this field is experimental and is going on live at various Institutes and Universities. This type of project was my own idea, and finds few parallels in current situation. Almost none deploy Condor at the Grid Computing Level (refer to the System Architecture). We couldn’t find any existing application working on this, so Estimation based historical data cannot be done.


Estimation based on Technique Provided

Three Estimation techniques have been used to generate three independent results for higher accuracy.

Problem based Estimation


LOC and FP are used in two ways during software project estimation:

Functions/Module

Estimated Line Of Code (LOC)

Megh Daemon

1032

Chatak Daemon

940

host_job.sh Shell Script Test Job

17

cvh_pass.c Test Job

89

tiny_mc.c Monte Carlo Test Job

62

time_mc.c Monte Carlo Test Job

128

small_mc.c Monte Carlo Test Job

125

pi_calc.c Pi Calculation Test Job

690

Total Line Of Code

3083


Empirical Based Estimation

The estimates for LOC are plugged into the COCOMO formula for effort and duration estimation.


The basic COCOMO Model is used, for which,

Effort E = a KLOC b

Duration D = c E d

a = 2.4, b = 1.05, c = 2.5, d = 0.38


E = 2.4 (KLOC) 1.05

E = 2.4 (3.803) 1.05

E = 9 person-months (approx)


D = 2.5 (E) 0.35

D = 2.5 (9) 0.35

D = 5 months (approx)


N = E / D

N = 1.8 (approx)


The above estimate indicates that for 1 member, it will take 9 months approx to finish the project.

  1. Project Risk Analysis

Project Risk









Risk Table


Risks

Category

Probability

Impact

Threads may not interact as expected

Development Environment

50%

Catastrophic

Lack of Hardware or Software

Technology to be built

30%

Critical

Lack of Knowledge

Staff size and experience

50%

Critical

Technology may not meet exceptions

Technology to be built

20%

Marginal

Schedule will be tighten

Business Impact

50%

Marginal

Bugs during Test

Staff size and experience

20%

Negligible

RMMM Plan


Risk

Mitigation

Monitoring

Management

Lack of knowledge

Study topic in detail. If many topics to be studied divide work among team members.

Technical inability to write certain code.

Check for regular code status and ensure its development on time.

Unavailability of resources.

Try to find out substitute. Discussion with management.

Unable to proceed.

Discuss problem. If no substitute found, try to get resources as early as possible

Schedule is tighten

Try to speed up project

Milestones are not completed as per deadline establish

Check for completion of milestones. Weekly review.

Technology may not meet expectations

Properly understand the method to install and to make changes according to the application

Inability to read the document of the software before using it

Read the document carefully. Try to install minimum things and test them step by step to make application running properly.


Besides this, the project manager (Internal & External Guides) need to review, talk individually and try to find out actual status of the project. Ensure that the project documents are completed as per guidelines.

Change Impact


Changes may be done in the project code if required under following conditions


Change impact should keep in mind while designing the software so as to reduce the risk. Function modification and addition of features is considered well in advance. Code is kept modular to incorporate these changes. This reduces the risk and effect cost marginally.


  1. Software Quality Assurance Plan


The application extensively uses POSIX compliant threads which are well designed and extensively supported across platforms.






Guidelines and Standards



Software Requirements Specifications (SRS)



  1. Introduction


There are a few important things about the state of modern computing that need to be thought about before one can really understand what the Grid is, and what potential benefits it holds. Today I write this project report from home on a PC with a 2.3 GHz CPU, 256 MB of RAM, 30 Gigabyte Hard-drive, not to mention a massively parallel vector math processor (known as a graphics card) with its own 64 MB of memory. A computational arsenal that would have cost millions of dollars and filled a room only a couple of years ago, but which is now commonplace if not even a little behind the times. For decades Moore’s law has been followed almost to the letter, and we have quickly emerged into a time of abundant computational power scattered over millions of computers all over the world. However there lies a great inefficiency in this organization of resources. The vast majority of computers spend most of their time in idle. There has been an excellent study conducted on this by Matt Mutka and Miron Livny in their paper “Profiling Workstations’ Available Capacity for Remote Execution”. This is cause for the effect that their computational potential goes to waste. It has been found that a computer is used, on average, around thirty percent of the time in a commercial/educational context, and far less than that in the home. In addition, most uses (apart from gaming), utilize only a paltry percentage of the total systems resources, such that even during use, the computer may as well be idle.


Any observant business man will tell you that inefficiencies are a great economic opportunity, in this particular case the potential economic benefits are off the scale. Should computational resources be utilized to their full potential the world would experience a seven fold increase in computational power, with the associated benefits to all computer using industries. This alone could herald a massive period of global economic growth. It is with this in mind that one should look at the Grid domain. While, in the end, it may not deliver the benefits as just expressed, the simple fact that it could, is just cause for excitement and determined investment in building this daunting technology.


  1. Purpose



But how can we increase resource utilization? There was another time when resource utilization was held back by inefficient use and lack of a coherent distribution (networking)system. Lets see how the grid got its name.


A common analogy used within the Grid community to explain the technology, is the modern electrical power grid. The advent of electricity alone did not transform the modern industrial landscape as one would expect. Oil and coal generators were expensive to build and maintain and power was inefficient without significant economies of scale. Basically most people could not afford to build their own generators for electricity, and in turn very few electrical appliances were developed. It wasn’t until the power grid made electricity cheaply and universally accessible that electric applications were developed and the modern industrial world was defined. The modern power grid is a technological marvel. It presents itself as essentially a single entity that provides power to millions of devices from a highly heterogeneous set of providers. Consumers vary greatly in electrical requirements, yet most everybody is guaranteed a high quality of service for what they require, at an affordable price. The computational Grid and the power grid face very different technological challenges and the analogy is not intended to reflect an identical situation, however the goals are much the same. The Grid intends to provide computational resources uniformly and universally to all sorts of consumers, from highly heterogeneous resources that cross organizational boundaries. Not only that, but similarly to the power grid, it will have to present economic solutions to encourage its use and ensure a flexible payment system for its large diversity of consumers. Basically creating a commodity market for consumers and producers of computational resources, and most probably realizing untold economies in resource use and cost. Lets be a little bold and propose that this would lead to a sort of computational ecology.


What sort of changes would this Grid bring? Well imagine some organization that periodically runs an intensive business simulation. Instead of making large technology investments, it is able to outsource its processing to a grid, and pay on a per/cpu-cycle basis. Imagine being able to bless any instrument (microscope, robot, etc) with a network connection, with a massive amount of intelligence, or to be able to share virtual spaces for collaborative design of complex systems. These are just a few of the applications that already exist and operate on The Grid. Currently there are no public Grids, however in the future, an internet based Grid would have the potential of massively increasing personal processor use, with the prospect of being paid for your idle time.


  1. Scope in Detail


The Grid “problem” is not solved however. The technology is still young and there are resource management, infrastructure, and security issues that need to be resolved. Any future Grid is most likely going to be a highly dynamic heterogeneous environment, with producers that vary greatly in quantity and quality of their resources, and pop in and out of the Grid environment. Here are a few questions to summarize the Grid Issues.



A few Grid projects have identified and implemented solutions to these questions. While they all have similar goals, there are a variety of design philosophies apparent in the solution domain. We will study the most prominent of the Grid technologies offered in details later. This will have particular focus on the Condor Grid High Throughput Computing software.



  1. Overall description

What is a Grid?


Now lets get technical lets attempt to answer the question that what is a grid ? We will answer this question form the implementation point of view. Exactly what combinations of Distributed Computing facilities will enable the particular solution to qualify as a grid? What are the basic points of difference between the clusters and other similar Distributed Computing environments and the grid ? We will present the “Three Point Checklist” for the grid solutions which is proposed by Ian Foster a prominent researcher in the field of Grid Technology.


The essence of the various definitions of Grids can be captured in a simple checklist,

according to which a Grid is a system that:



(A Grid integrates and coordinates resources and users that live within different control domains—for example, the user’s desktop vs. central computing; different administrative units of the same company; or different companies; and addresses the issues of security, policy, payment, membership, and so forth that arise in these settings. Otherwise, we are dealing with a local management system.)



(A Grid is built from multi-purpose protocols and interfaces that address such fundamental issues as authentication, authorization, resource discovery, and resource access. As I discuss further below, it is important that these protocols and interfaces be

standard and open. Otherwise, we are dealing with an application specific system.)




(A Grid allows its constituent resources to be used in a coordinated fashion to deliver various qualities of service, relating for example to response time, throughput, availability, and security, and/or co-allocation of multiple resource types to meet complex user demands, so that the utility of the combined system is significantly greater than that of the sum of its parts.)


Of course, the checklist still leaves room for reasonable debate, concerning for example what is meant by “centralized control,” “standard, open, general-purpose protocols,” and “qualities of service.”

  1. System Features


Design Specifications


  1. Design Considerations


Assumptions and dependencies


General Constraints




  1. System Architecture


Proposed Architecture of a Grid Scheduler


We will first examine in detail the architecture which is proposed for the grid scheduler. And then see how the Scheduler we have developed fulfills the requirements laid down by the proposed model.


Following is presented a general architecture for scheduling on a Grid. A Grid scheduler (or broker) must make resource selection decisions in an environment where it has no control over the local resources, the resources are distributed, and information about the systems is often limited or dated. The Grid scheduling architecture has three phases: resource discovery, system selection, and job execution. step. We note that no current Grid scheduler implements all of the steps of this architecture.



Single sites are simply no longer efficient for meeting the resource needs of high-end applications, and using distributed resources can give the application many benefits. Effective Grid computing is possible, however, only if the resources are scheduled well. Grid scheduling is defined as the process of making scheduling decisions involving resources over multiple administrative domains. This process can include searching multiple administrative domains to use a single machine or scheduling a single job to use multiple resources at a single site or multiple sites.


Here we do not address the situation of speculative execution submitting a job to multiple resources and, when one begins to run, canceling the other submissions. We do, however, discuss resource selection (sometimes termed resource discovery , assignment of application tasks to those resources (mapping ), and data staging or distribution.


One of the primary differences between a Grid scheduler and a local resource scheduler is that the Grid scheduler does not own the local resources and therefore does not have control over them. The Grid scheduler must make best-effort decisions and then submit the job to the resources selected, generally as the user.


Here the user is not supposed to be root but some other user with limited privileges. The job submission by user root is not allowed due to security reasons. Here in our model of the Grid Scheduler we do not make efforts to submit the jobs, as the jobs may vary in the language they are coded in and the issue of maintaining logs and the intermediate data at one site cannot be handled efficiently by the Scheduler.


Furthermore, the Grid scheduler does not have control over the set of jobs submitted to it, or even know about the jobs being sent to the resources it is considering use of, so decisions that tradeoff one job's access for another s cannot be made in the global sense. This lack of ownership and control is the source of many of the problems to be solved in this area.


This model Grid Scheduler does the application level scheduling, while Condor does the job level scheduling for the Grid.



  1. Architecture of the Grid Scheduling Algorithm Test Bed (CMs)


We will now map the architecture of the Grid Scheduling Algorithm test bed on to the proposed architecture of the Grid Scheduler. We will follow the step by step procedure of the Grid Scheduling and then describe in details how the Test-Bed Scheduler performs these steps.


The architecture of the Model Grid Scheduler is multithreaded. To minimize the cost of communication between different parts of the same entity, the design based on threads was selected. The different functions of the Grid Scheduler have been modularized into thread functions and implemented.


The Architecture of The CMs System.





The CMs is basically Agent based Scheduler, hence it is implemented in two parts which are named as Megh and Chatak.


Following are the details of their functions and the constituent threads:



  1. Megh (Daemon Process)


        1. Inter Megh Communication.

        2. Free Host Handler Thread.

        3. Requestor Host Handler Thread.

        4. Scheduler Thread (Match Maker).

        5. History Thread.


  1. Chatak (Daemon Process)


        1. Megh Searcher Thread.

        2. Host Reporting Thread.

        3. Host Requestor Thread.

        4. CPU Idle time calculating Thread.

        5. Reconfiguration and Rescheduler Thread.

        6. History Thread.


As we can observe in the above figure the Chatak daemon process runs on the Condor Masters. And the Megh daemon can be located any where on the network. The daemons in the architecture represent the actual work to be carried out by the respective parts of the scheduler.


Following are the functional details of each of the threads the Scheduler implements.


The Scheduler Parameters


  1. For Requesting Hosts

      1. char req_host_str[256];

      2. char *free_hosts_ptr[256];

      3. int hosts_taken;

      4. long int rank;

      5. int nice_val;

      6. long int req_queue_len;

      7. int time_fact1;

      8. int time_fact2;

      9. int stability_fact;



  1. For Free Hosts

      1. char free_host_str[256];

      2. char given_to_host[256];

      3. int hosts_serviced;

      4. long int rank;

      5. int nice_val;

      6. unsigned long int cpu_kflops;

      7. unsigned long int cpu_mips;

      8. unsigned long int memory;

      9. double avg_cpu_idle;

      10. int stability_fact;




Chatak (Daemon Process)


First of all the Chatak is installed on the interested Condor Masters, It runs with the root privileges. Basically it can run as any user who has sufficient permissions to change the Condor configuration files and issue commands to the underlying Condor Master. As it communicates on ports > 10000 it can run as any user.


When the Chatak daemon starts up it first senses how many jobs does the underlying Condor Master has got waiting. If is queue value is greater than certain threshold value, the Chatak contacts the Megh for requesting of more hosts to run jobs on. Otherwise the host here is reported free and the corresponding values of the host are entered in the database of Megh. The switch here is made on the queue value solely because, the value of CPU busy can be very misleading and incorrect. If the request for the machines is not fulfilled by the Megh , Chatak retreats and tries again after a short interval of time. If the request is successful the address of the extra host got is given to the reconfiguration thread to rewrite the configuration files and reschedule the jobs on that Condor Master.


There was a choice here presented to use either some of the readymade tools for the reporting of the host status, the NWS (Network Weather Service ) or making the utilities from scratch. Here the host properties are found out and reported by a mechanism written from scratch to have greater control over the details of the implementation. This mechanism is embedded inside of the Chatak to report the various scheduling parameters to the Megh daemon. The Megh is the place where the scheduling decisions are made and then executed, it also maintains the history of the decisions made for a more critical and minute view of the scheduling algorithm involved.




Megh (Daemon Process)



The Megh is the other part of the agent based Scheduler CMs. When the Chataks on various host machines wake up they first of all check whether the underlying Condor Master is OK or not and then check on the queue length status. They depending upon the various configurable switching parameters either report their status to the Megh , about their free properties . Or they report the status of the queue lengths and then ask for more Flockable hosts from the Megh.


The architecture for Megh is multithreaded and based on the POSIX Threads implementation. It uses different threads to handle the type of requests coming in. For example, Free Host Handler thread will handle the free host reports coming in . The Requesting Host handler thread will handle the requests for the hosts coming in. The scheduler thread will handle the scheduling and the history keeping part. The History keeping part may be implemented as a separate thread itself.


An improvement to the CMs architecture would be to add a thread which will handle the inter Megh communications. This will facilitate the exchange of scheduling information between different Meghs spread over the network. Making this model scheduler more pronounced as a Multi Agent based scheduler. And also will be a step further in the direction of eliminating the Megh as a single point of failure, by replicating the database throughout the network.

Now Lets try to map the proposed Architecture on to our Chatak-Megh Samvad.



Phase 1: Resource Discovery


The first stage in any scheduling interaction involves determining which resources are available to a given user. The resource discovery phase involves selecting a set of resources to be investigated in more detail in Phase 2. At the beginning of Phase 1, the potential set of resources is the empty set; at the end of this phase, the potential of resources is some set that has passed a minimal feasibility requirement. Resource discovery is done in three steps: authorization filtering, job requirement definition, and filtering to meet the minimal job requirements.


The first step of resource discovery in job scheduling is to determine the set of resources that the user submitting the job has access to. In this regard, computing over the Grid is no different from remotely submitting a job to a single site: without authorization to run on a resource the job will not run. At the end of this step the user will have a list of machines or resources to which he or she has access. The main difference that Grid computing lends to this problem is sheer numbers. It is now easier to get access to more resources, although equally difficult to keep track of them. So we have designed Chataks to report more frequently to the Megh to keep the information updated.


This First Phase is handled by the Chataks installed on each machine. They obtain information from various sources such as the /proc file system , condor_q , condor_status command. They process this information and forward it to the Megh Daemon in the required format. These are the actual scheduler parameters to be used for the allocation of free machines to the requesting hosts .The three steps are done by the Chatak Damon as follows :


  1. Authorization filtering


Chatak runs as user root and the Condor runs as the user Condor, Jobs execute as the user nobody. This is how the various authentication filtering is handled. It is done internally via the UNIX file permissions and user authentication procedures. Condor does not require an account (login) on machines where it runs a job. Rather, it uses remote system call technology, which traps library calls for such operations as reading or writing from disk files. The calls are then transmitted over the network to be performed on the machine where the job was submitted.


  1. Job requirement definitions


Each job submitted will have to have a .cmd file which contains the nunaces of that job. You can suppy the additional requrements by using the syntax of the job submission files


ex :


Requirements = (OpSys == "IRIX65" && Arch =="SGI") ||
(OpSys == "LINUX" && Arch =="INTEL")

or


Requirements = Name=="infn-corsiLL.corsi.infn.it" || \
Name=="infn-corsiRR.corsi.infn.it"



  1. Minimum Requirement Filtering

  2. Filtering to meet minimum job requirements.


This is handled by the Condor Schedd at the lower level and the application level is handled by the CMs Software. Whence the minimum job requirements of single job are fulfilled the Condor Master on that machine will allow that job to run. Condor also does an initial matching (using ClassAds and the matchmaker) and then a feasibility evaluation upon claiming the actual resources. At the application level the Chatak will sense the job queue length (or any other set parameter ) to satisfy the minimum requirements of the job queue on that Condor Master, and enable the queue to finish more time efficiently, by providing with extra machines. We maintain that as systems grow, this stage will be an important one for continued scalability of other Grid-level schedulers.


Phase : 2 System Selection


Given a group of possible resources (or a group of possible resource sets), all of which meet the minimum requirements for the job, a single resource (or single resource set) must be selected on which to schedule the job. This selection is generally done in two steps: gathering detailed information and making a decision. We discuss these two steps separately, but they are inherently intertwined, as the decision process depends on the available information. This process is done jointly by the Megh and Chataks running on separate machines. Following are the details.



  1. Dynamic Information Gathering


In order to make the best possible job/resource match, detailed dynamic information about the resources is needed. This data is collected at configurable intervals. The basic Condor approach generates the information on each machine, sends this data to the central manager, and updates the dynamic information every five minutes. The Chatak on each system will also in a similar fashion report the status of the resource to the Megh for the updating of database. The data provided is


For the Free Hosts


For the Requesting or Overloaded Hosts



  1. System Selection


With the detailed information gathered in Step 4, the next step is to decide which resource (or set of resources) to use. Various approaches are possible. The most commonly used system for matching application requirements to resources is the Condor Matchmaker/ClassAd system. A ClassAd can contain an expression that evaluates how well it matches to the corresponding ClassAds. This approach is taken at the job level or the lower level by the Condor System. At higher level a specific policy based on the algorithm to be implemented can by specified. This will dictate the manner in which the CMs software will allocate hosts to the requesting machines.

The Megh then based on the applied scheduling policy will issue a host Internet address to the requesting machine to enable the migration of jobs to the target machine. The Chatak on the requesting machine is then responsible for the receiving of this address and the reconfiguration of the underlying Condor Master and rescheduling of the jobs on it. This should allow more flexibility and a better scaling effect as the number of local resources considered grows.


Phase 3: Job Execution


The third phase of Grid scheduling is running a job. This involves a number of steps, few of which have been defined in a uniform way between resources. The CMs system, will only issue suggestions to the requesting Condor Master via the Chatak deployed on each of them. This will be done via the changing of the FLOCK_TO field in the configuration file of the respective Condor Master. The lower level details of job scheduling and actual process migration are handled by the Condor Masters. Following is the detailed description of how the Condor and CMs goes by the matching of jobs and resources, and the execution.


  1. Advance Reservation




The Advanced Reservation step is handled jointly by the Megh Scheduler thread and the Chatak Requesting Host Reconfiguration thread with the Condor System. This is done by the Scheduler thread which gives away the address of the Free host to the requesting host. And then the respective receiving Chatak will get this address and then reconfigure the underlying Condor Master according to the data received.


  1. Job Submission


The jobs are submitted on the submitting machine by the user via the condor_submit command. The user can submit any no of jobs from the submitting host he does this by the condor_submit command. These jobs can being to the various universes defined by Condor .




  1. Preparation Tasks


Preparation Tasks are handled jointly by the Condor System and the Chatak Requesting Host Handler Thread.


The remaining three steps are carried out by the Condor System based on the suggestions given to it by the CMs software.


  1. Monitoring Progress

  2. Job Completion

  3. Cleanup Tasks


Following figure shows the intricate details of job handling by Condor.






  1. Startd sends collector ClassAd describing itself. (The Schedd does as well, but it has nothing interesting to say yet.)

  2. The user calls condor_submit to submit a job. The job is handed off to the schedd and condor_submit returns.

  3. The schedd alerts the collector that it now has a job waiting.

  4. The negotiator asks the collector for a list machines able to run jobs and schedd queues with waiting jobs.

  5. The negotiator contacts the schedd to learn about the waiting job.

  6. The negotiator matches the waiting job with the waiting machine.

  7. The negotiator alerts the schedd and the startd that there is a match.

  8. The schedd contacts the startd to claim the match.

  9. The schedd starts a shadow to monitor the job.

  10. The startd starts a starter to start the job.

  11. The starter and the shadow contact each other.

  12. The starter starts the job.

  13. If the job is using the Condor syscall library (typically through being condor_compiled), it will contact the shadow to access necessary files.


Now following is the Diagram based on the Proposed Architecture of the Grid Scheduler with the relevant parts which are being carried out by the CMs Scheduler labled so. The following diagram sums up the constituent parts of the proposed grid scheduler architecture and how CMs implements it and exactly which parts of CMs implements it.



UML Diagrams


  1. Use Case Diagram

  2. Analysis Classes

  3. Activity Diagram

  4. Sequence Diagram

  5. Collaboration Diagram

  6. Class Diagram

  7. State chart Diagram












Coding


  1. Code Specification


Softwares Used

  1. Redhat Linux CERN Version (7.3)

  2. Redhat Linux 9

  3. Debian Sarge

  4. Condor HTC Software (6.6.8)

  5. MPICH Library 1.2.5 for MPI World Program

  6. Ssh for data transfer betweens two nodes.

  7. Mpicc to compile MPI programs.

  8. condor_complie for compiling C Language Program.

  9. VIM editor to write the program.

  10. Anjuta for the compilation and managment of Project

  11. Gnuplot to plot the graph of the tests conducted.

  12. EtherApe for the display of grid activity

  13. Lex Lexical Analyzer generator.

  14. Kate Advanced Text editor.

  1. Coding Style followed


Coding Standards


We have followed Hungarian Notation for coding. These are obtained from MSDN, and different Web Sites. Coding standard can be obtained from Coding Conventions Documents for our project. Coding style followed is Linux based as most of GNU/Linux in-built programs or standard book provide.


The indentation style followed is GNU/Linux style.


The coding policy is to minimize the hard coded part and to query for information wherever possible.


Test Overview


  1. Architecture Overview for Testing

Testing and benchmarking is the most important phase in the life cycle of a project as it provides a mechanism to find if the goals of the project have been met to satisfaction. The project was subjected to testing during its construction as well as after completion.


  1. Flow of Operation

  1. Machines Boot up

  2. Condor Masters (Condor Software) on all machines is started

  3. Initial benchmarking and internal performance analysis is done by the software

  4. User submits jobs on the submit machine

  5. The Condor begins negotiation for jobs from the Condor Central Manager.

  6. The Chatak comes up on the client machine.

  7. The Megh comes up on the machine running the Megh daemon

  8. Chatak depending on some fixed switch either reports the underlying host as free or as requests the Megh for more machines by reporting the host as “Requesting Host”

  9. Megh accepts the Host report either as “Free” or “Requesting”.

  10. Megh Scheduler thread then matches the Free hosts with the Requesting hosts and then gives the Requesting hosts the addresss of Free hosts for flocking.

  11. Chatak accepts the addresses and reconfigures the underlying Condor

  12. Chatak after the reconfiguration is complete , reschedules to jobs on the underlying machine.

  13. The jobs now migrate from the Requesting host to the Free hosts and start executing there.


  1. Details of Machines and Interconnecting Network


Machine Descriptions

Machine 1:


Hostname: vishwamitra.it.mitp

IP Address: 10.1.30.99

CPU: Pentium 4 (1.4 GHz)

Memory (RAM): 256 MB

Hard disk: 30 GB

Network Card: 100 MBps

Condor Version: 6.6.8

Operating System: Red Hat Linux EL 3 Enterprise Edition

Physical Location: Internet Center control room


Machine 2:

Hostname: shandilya.it.mitp

IP Address: 10.1.30.105

CPU: Pentium 4 (1.4 GHz)

Memory (RAM): 256 MB

Hard disk: 30 GB

Network Card: 100 MBps

Condor Version: 6.6.8

Operating System: CERN Edition GNU/Linux 7.3

Physical Location: Software Lab 4


Machine 3:

Hostname: ganga.it.mitp

IP Address: 10.1.30.206

CPU: Pentium 4 (1.4 GHz)

Memory (RAM): 256 MB

Hard disk: 30 GB

Network Card: 100 MBps

Condor Version: 6.6.8

Operating System: Fedora Project GNU/Linux Core 3

Physical Location: Software Lab 4


Machine 4:

Hostname: vayu.it.mitp

IP Address: 10.1.30.4

CPU: Pentium 4 (1.4 GHz)

Memory (RAM): 256 MB

Hard disk: 30 GB

Network Card: 100 MBps

Condor Version: 6.6.8

Operating System: Red Hat GNU/Linux 9

Physical Location: Software Lab 2,3 Control Room


Machine 5:

Hostname: minix.it.mitp

IP Address: 10.1.30.25

CPU: Pentium Mobile 4 (2.13 GHz)

Memory (RAM): 256 MB

Hard disk: 30 GB

Network Card: 100 MBps

Condor Version: 6.6.8

Operating System: Debian GNU/Linux Sarge

Physical Location: Personal Laptop




Network Description

Ethernet 100Mbps, UTP CAT 5 Cabling




  1. Tests Conducted:


The tests consists of the following jobs to be submitted to Condor:


  1. host_job.sh


This is a simple shell script which will execute on the target machine. It will sleep for some amount of time (passed as command line argument), and then terminate. It will for the purpose of establishing the identity of the machine it is executing upon will print the name of the host it is executing upon and the process id it has got. It will additionally also describe the command line arguments passed to it, and the exact location of the binary on the executing machine.


  1. pi_calc.c


This program is written in C and will calculate the value of Pi to the no of places given as a command line argument. This is a typical example of long running job. This job is included in the test suite to test if the information is lost on the disturbance of the executing machine. The no of places supplied must be the power of 2. This job prints the Pi value thus calculated on to the stdout, which is redirected to a file pi_calc.out.


The rest of the programs are of Monte Carlo simulations, these form the typical problems executed on the grid for High Energy Physics experiments.


The Monte Carlo technique is a very flexible method for simulating light propagation in tissue. The simulation is based on the random walks that photons make as they travel through tissue, which are chosen by statistically sampling the probability distributions for step size and angular deflection per scattering event. After propagating many photons, the net distribution of all the photon paths yields an accurate approximation to reality.


  1. tiny_mc.c


Simulates light propagation from a point source in an infinite medium with isotropic scattering. The entire source for this program fits on a single page and is a good way to get an overview of the entire Monte Carlo process.

  1. small_mc.c


Simulates light propagation from normal irradiation of a semi-infinite medium with anisotropic scattering. It calculates the volumetric heating as a function of depth. This program takes two whole pages, but is very handy for a bunch of problems.


  1. time_mc.c


Simulates the time resolved backscattering of a semi-infinite medium with anisotropic scattering. This program is adapted from small_mc.c above, and shows how simply time resolved simulations can be done.


  1. cvh_pass.c


The last program of the suite is a an implementation of the brute force password cracking method on the grid. It uses the brute force method to guess a password. It also displays the time taken for it to complete. For benchmarking purposes, this ouput of the program can be used. This is also a typical example of long running jobs submitted to the Grid for solutions. It can also be used constructively to test the strength and quality of the password chosen.






  1. Details of tests conducted and results


The system was subject to tests wherein the queue completion time was taken in consideration. Here we are not concentrating on the completion time of a single job. We have formed queues of the jobs described above. These queues are submitted fromt eh submitting machine and their completion is monitored.


The time taken for the completion of each queue is reported. The whole bunch of queus of 6 jobs in all is submitted atleast 3 times . This is done to get a fair approximation of the completion time of the queue on the test machine.


The Scheduling algorithms are tested on identical machines and identical network environment. The same jobs are submitted in the same way for each of the scheduling algorithm to be implemented. The improvement or the degradation in the completion time of the queues give the fair idea of how the scheduling algorithm is fairing.


The following is the description of the queues used in the evaluation :


host_job: 50 Jobs in a queue


pi_calc.c: 50 Jobs in a queue


small_mc.c: 50 Jobs in a queue


tiny_mc.c: 50 Jobs in a queue


time_mc.c: 50 Jobs in a queue


cvh_pass.c: 50 Jobs in a queue


Following are the results of the test conducted to calculate the Job Queue Completion on a single host without the software installed.


Job Queues Submitted

Total Time ( %R )

CPU ( %U )

CPU ( %S )

CPU (%U + %S) / %R

host_job

375.195

253.57

54.55

81.12

pi_calc

3516.684

747.04

109.31

24.35

tiny_mc

672.941

281.29

37.81

47.51

small_mc

578.197

87.43

15.78

17.85

time_mc

1877.33

453.95

66.13

27.7

cvh_pass

229.253

45.81

9.08

23.94

host_job

775.052

547.13

88.87

82.05

pi_calc

3468.373

698.64

105.46

23.18

tiny_mc

675.215

282.15

38.1

47.42

small_mc

583.393

87.75

16.74

17.91

time_mc

1876.797

453.9

65.67

27.68

cvh_pass

227.98

45.54

8.68

23.78

host_job

385.63

258.43

56.45

81.65

pi_calc

3351.926

603.94

95.49

20.86

tiny_mc

676.318

280.47

37.34

46.99

small_mc

581.897

85.77

16.71

17.61

time_mc

1986.585

519.17

72.69

29.79

cvh_pass

226.954

43.35

8.37

22.78

host_job

771.888

540.65

90.46

81.76

pi_calc

3054.665

387.61

70.58

14.99

tiny_mc

379.045

63.64

12.79

20.16

small_mc

583.558

86.15

16.61

17.6

time_mc

1859.18

424.17

65.04

26.31

cvh_pass

231.45

48.39

9.17

24.86

host_job

705.914

489.32

84.84

81.33

pi_calc

3046.347

385.24

68.68

14.9

tiny_mc

674.19

279.09

37.88

47.01

small_mc

581.108

85.63

16.35

17.54

time_mc

1547.617

202.12

38.44

15.54

cvh_pass

254.142

56.46

10.85

27.45

host_job

678.462

472.3

80.02

81.4

pi_calc

3508.346

715.79

109.84

23.53

tiny_mc

679.241

283.33

38.49

47.37

small_mc

581.797

87.39

17.26

17.98

time_mc

1556.265

204.87

39.52

15.7

cvh_pass

233.873

47.92

9.38

24.5


  1. Predictions for further tests

Further tests are expected to show that the job queue times decreases steadily as more machines are acquired. This can be proved when there are at least 5+ machines for testing, as the scheduling algorithm implemented should favor some machines and penalize others. The job queue timings of these two machines should be always less that that of the single machine but should be different from reach other.


A scheduling algorithm will be proved to be better is it successfully demonstrates the decrease in queue times of test machines according to the scheduling algorithm specified



Application and Future enhancements


  1. The Chatak Megh Samvad’s future applications



The project holds a great promise for application in the fields of distributed computing in general and Grid Computing in particular. The CMs project in some ways tries to emulate the grid on a small scale. The scale referred here is with respect to the manageability of the grid size. The main purpose of the CMs Project is to facilitate the study of the Grid Scheduling algorithms and aid in their design and analysis. It does this without costing the Researcher much hardship.


The objective of computational grids is to coordinate shared, distributed heterogeneous resources to work as a single computational resource. The availability of such an environment opens new horizons for research in areas previously unexplored or limited for economic or impractical reasons. The existence of long distance, low cost, high performance networks is encouraging the development of applications which take advantage of geographically dispersed resources. However, relatively few applications exploit the computational power available from such environments efficiently. Due to the diversity of resources, their dynamic behavior and the instability generally encountered in grids, developing applications capable of executing efficiently in such environments is still a challenge.


Users of parallel system frequently complain of the difficulty of achieving a reasonable fraction of the theoretical peak performance from the systems they use. In environments like grids with so much variation, it is extremely difficult for users (typically scientist and engineers with little system knowledge) to decide, for each application and for each execution, which resources would be the most appropriate. The challenge is to execute applications efficiently and robustly in grid environments, without placing this burden on the programmer or the user.


There are many points in the favor of the CMs Project, the ways it scores over the existing Grid Scheduling Algorithms Testbeds and caters to the need of scientists:


  1. It is based on Condor as a base. So it is free from the low level problems to be solved.

  2. The Researcher can at his will change the scheduling algorithm.

  3. The Researcher can specify various scheduling parameters to be included or excluded from the scheduler.

  4. The Researcher can decide on exactly what basis the hosts are decided to be free or busy, i.e. he can decide the thresholds very comfortably.

  5. The Condor HTC software is Free for download and is available in various flavors for various platforms.

  6. The nuances of the platform are handled well in Condor, this is the principle advantage of the CMs software not getting involved in the intricacies of job executions.

  7. The design of the CMs system is based on the Agent Based Scheduler, so it incorporates all the positive points of this technique.


Future development of the system may lead to the development of the system such as a Tool for the Design and Evaluation of Hybrid Scheduling Algorithms for Computational Grid. We have studied a case of such implementations, which will impress the point that this system in future may be able to serve practical needs of scientists.






Screen Shots


Status of various machines


Free state of vishwamitra.it.mitp





















Free state of minix.it.mitp























Free state of ganga.it.mitp























Free state of vayu.it.mitp























Unavailable state of shandilya.it.mitp
























Example Job to be submitted (host_job.sh)























Example Job description file























Job submission























Jobs not running on shandilya.it.mitp























After reconfiguration by CMs Job run on remote Free Machines

























Monitoring of Machine Status










Conclusion


Research in Grid Computing is progressing at an enormous pace. New ideas are coming up at every hour. Various Research Labs are at cut throat competition for supremacy and total control over the field. This project was conceived with an aim of helping the researches test out the various scheduling strategies without worrying about the lower level implementation details. The main aim of the project is to bridge the gap between theory and practice.


The working environment of the Grid Computing being different from the turf of the traditional Scheduling algorithms, simulating this is a challenge. The heterogeneity and the highly dynamic behavior of the grid computing demands a new class of scheduling strategies. And for their testing development of a new breed of testbeds. The project aims at proving that the simulation for a grid and implementation of user specific scheduling policy on it is possible without delving into minute details of Job level scheduling.


This software working at the application level tries to implement a user specified scheduling policy onto the underlying Grid. There being only one or two parallels of this kind of work testifies the challenges face by the Grid Resource Management deployment and testing.


The software touching on various parts of Computer Engineering such as Computer Networks, Compiler Construction, Operating Systems and Distributed Computing was a great learning experience and a though being a single person effort involved the contributions of many. Though the software may appeal to many as wanting in many areas, but is a step in the right direction.




References


  1. A Report for the NSA LUCITE Task Order Productive Use of Distributed Reconfigurable Computing February 21, 2001

Tarek El-Ghazawi et al


  1. What’s all the Fuss About? (IEEE IT Professional March-April 2004)

Grid Computing 101



  1. SETI@Home–Massively Distributed Computing For SETI.

Eric Kaplan, Dan Werthimer, David Anderson, Jeff Cobb and Matt

Lebofsky


  1. Profiling Workstations’ Available Capacity for Remote Execution.

Matt Mutka and Miron Livny.



  1. What is Grid ? a Three Point Checklist.

Ian Foster


  1. On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing.

Ian Foster and Adriana Iamnitchi


  1. CondorGrid Computing from Mobile Handheld Devices.

Fransisco J. , Javier V. , Miron L. , Enrique C. and Luis A.




  1. Resource Discovery in Distributed Networks.

Mor Harchol-Balter, Tom Leighton and Daniel Lewin.



  1. A General Architecture for Scheduling on the Grid(Argonne National Laboratory reprint 2001)

Jennifer M. Schopf


  1. Condor Version 6.6.8 Manual (April 15, 2004)

Condor Team, University of Wisconsin,Madison:


  1. Scheduling Remote Processing Capacity In A Workstation-Processor Bank Network

Matt Mutka and Miron Livny


  1. Automated Scheduling, Optimisation and Planning (ASAP) Research Group Report (2001 2002)

Dr Jon Garibaldi


  1. Open Issues in Grid Scheduling Report of the workshop held at the e-Science Institute (October 21-22 2003)


  1. Agent-based Resource Management for Grid Computing

Junwei Cao et al


- 146 -