Setting up Hadoop YARN (Hadoop 2) developement environment in Linux


Apache Hadoop MapReduce has completely modified and now known as YARN
(Hadoop Operating System). Idea behind this is to split up the role of JobTracker into ResourceManager and ApplicationMaster.
Apache Hadoop can be installed by any of the following method.
1. Using tarball
2. Using Apache Ambari (automated)
3. Using RPM or Deb packages
Following Steps explain deployment of  Hadoop in Psudo-distributed(Single Node culster)
mode using tarball. I have used HDP tarball for CentOS-6.4 at the time of writing this article but these steps also work for Ubuntu
Step [1]: Download and unpack a Hadoop-2.2.* (or later) tarball
Download and unpack a Hadoop-2.2.*.x (or later) tarball from here
For CentOS-6.4 :
$ wget http://public-repo-1. hortonworks.com/HDP/suse11/2.x/updates/2.0.6.0  /tars/hadoop-2.2.0.2.0.6.0 101.tar.gz
For Ubuntu-12.04:
$ wget http://public-repo-1.hortonworks.com/HDP/ubuntu12/2.0.6.1/tars/hadoop-2.2.0.2.0.6.0-101.tar.gz
Define $YARN_HOME
$ export YARN_HOME=/opt/hadoop-2.2.*.x (where you unpacked the tarball)
Step [2]: Modifiy the core-site.xml
$ cd $YARN_HOME
$ vim ./etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Step [3]: Modify the hdfs-site.xml
$ cd $YARN_HOME
$ vim ./etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hikmat/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hikmat/yarn_data/hdfs/datanode</value>
</property>
</configuration>
Step [4]: Modify the yarn-site.xml
$ cd $YARN_HOME
$ vi ./etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
Step [5]: Modify the mapred-site.xml
$ cd $YARN_HOME
$ vim ./etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Step [6]: Modify yarn-env.sh
$ cd $YARN_HOME
$ vim ./etc/hadoop/yarn-env.sh
export HADOOP_CONF_DIR="${HADOOP_CONF_DIR:-$YARN_HOME/etc/hadoop}"
export HADOOP_COMMON_HOME="${HADOOP_COMMON_HOME:-$YARN_HOME}"
export HADOOP_HDFS_HOME="${HADOOP_HDFS_HOME:-$YARN_HOME}"
Step [7]: Create local directories for the namenode and datanode
$ mkdir -p /home/hikmat/yarn_data/hdfs/namenode
$ mkdir /home/hikmat/yarn_data/hdfs/datanode
Step [8]: Format the namenode
$ cd $YARN_HOME
$ bin/hdfs namenode -format
Step [9]: Start the HDFS services
$ cd $YARN_HOME
$ ./bin/hdfs namenode
$ ./bin/hdfs secondarynamenode
$ ./bin/hdfs datanode
Step [10]: Start the YARN services
$ cd $YARN_HOME
$ ./bin/yarn resourcemanager
$ ./bin/yarn nodemanager
If all went well, typing ‘jps’ at the command prompt will show you all services running:
$ jps
95579 ResourceManager
94607 NameNode
6815 Jps
94801 DataNode
95723 NodeManager
94950 SecondaryNameNode