ABSTRACT

Hadoop is a large scale distributed processing infrastructure designed to handle data intensive applications. In a commercial large scale cluster framework, a scheduler distributes user jobs evenly among the cluster resources. The proposed work enhances Hadoop’s fair scheduler that queues the jobs for execution in a fine grained manner using task scheduling. In contrast, the proposed approach allows backfilling ofjobs submitted to the scheduler. Thus job level and task level scheduling is enabled by this approach. The jobs are fairly scheduled with fairness among users, pools and priority. The outcome of the proposed work is that short narrow jobs will be executed in the slot if sufficient resource is not available for larger jobs. Thus shorter jobs get executed faster by the scheduler when compared to the existing fair scheduling policy that schedules tasks based on their fairness of remaining execution time. This approach prevents the starvation of smaller jobs if sufficient resources are available.

Keywords: hadoop, scheduling, fair share scheduler, backfilling