- Sorting [GropingComparator and KeyComparator]
- Shuffle: The process of moving map outputs to the reducers is known as shuffling
- Sort: Each reduce task is responsible for reducing the values associated with several intermediate keys.The set of intermediate keys on a single node is automatically sorted by Hadoop before they are presented to the Reducer.
- Speculative Execution: As most of the tasks in a job are coming to a close(nearly 95%),the Hadoop platform will schedule redundant copies of the remaining tasks across several nodes which do not have other work to perform. This process is known as speculative execution. When tasks complete, they announce this fact to the JobTracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the TaskTrackers to abandon the tasks and discard their outputs. By default it is true.
- Filtering or Grepping
- Parsing, Conversion
- Counting, Summing
- Binning, Collating
- Distributed Tasks
- Simple Total Sorting
- Chained Jobs
- Secondary Sort
- Distributed Total Sort
- Distributed Cache
- Reading from HDFS programmetically
- Dimension Reduction
- Evolutionary Algorithms
First create the JobConf object “job1” for the first job and set all the parameters with “input” as inputdirectory and “temp” as output directory. Execute this job: JobClient.run(job1). Immediately below it, create the JobConf object “job2” for the second job and set all the parameters with “temp” as inputdirectory and “output” as output directory. Finally execute second job: JobClient.run(job2).
Create two JobConf objects and set all the parameters in them just like (1) except that you don’t use JobClient.run. Then create two Job objects with jobconfs as parameters:
Job job1=new Job(jobconf1); Job job2=new Job(jobconf2); JobControl jbcntrl=new JobControl("jbcntrl"); jbcntrl.addJob(job1); jbcntrl.addJob(job2); job2.addDependingJob(job1); jbcntrl.run();