Installing CDH 5 Hadoop YARN on a Single Linux Node in Pseudo-distributed Mode



For development purpose, Apache Hadoop and CDH 5 components can be deployed on a single Linux node in pseudo-distributed mode. In pseudo-distributed mode, Hadoop processing is distributed over all of the cores/processors on a single machine. Hadoop writes all files to the Hadoop Distributed File System (HDFS), and all services and daemons communicate over local TCP sockets for inter-process communication.

Prerequisites

  • Supported Operating Systems: RedhatEL,Ubuntu,Debian,CentOS,SLES,OracleLinux->64-bit
  • Supported JDK Versions: >= jdk-1.7.025
  • Supported Internet Protocol: CDH requires IPv4. IPv6 is not supported.
  • SSH configuration:SSH should be configured

STEP-1

Download CDH tarball from cloudera
$ wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.0.1.tar.gz
$ cd /opt
$ tar -xzf ~/hadoop-2.3.0-cdh5.0.1.tar.gz
$ cd hadoop-2.3.0-cdh5.0.1

Edit config files

  • core-site.xml
  • hdfs-site.xml
  • mapred-site.xml
  • yarn-site.xml
  • hadoop-env.sh
  • yarn-env.sh
  • mapred-env.sh

    OR

$ git clone https://github.com/mehikmat/hadoop-install.git
$ cp -R hadoop-install/etc/hadoop/* $HADOOP_HOME/etc/hadoop/

STEP-2

Create dirs,user, and set Java Home for all users

$ git clone https://github.com/mehikmat/hadoop-install.git
$ cd hadoop-install/users-and-dirs

Set Java for all users
$ ./java.sh

OPTIONAL:>   #Create users(if you want use separate users)
$ ./users.sh # no need to create multiple users in single node

Create required directories
$ ./dirs.sh  # edit HDFS_USER,YARN_UER,and MAPRED_USER variables in this file to point same user

Edit ~/.bashrc [optionally for file of hdfs and yarn user]

export HADOOP_HOME=/opt/hadoop-2.3.0-cdh5.0.1
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
Refresh bash profile $ bash

STEP-3

Create HDFS dirs

Create the history directory and set permissions and owner
$ sudo -u hdfs hdfs dfs -mkdir -p /user/log/history       OR [hdfs dfs -mkdir -p /user/log/history]
$ sudo -u hdfs hdfs dfs -chmod -R 1777 /user/log/history  OR [hdfs dfs -chmod -R 1777 /user/log/history]
$ sudo -u hdfs hdfs dfs -chown mapred:hadoop /user/log/history OR [hdfs dfs -chown mapred:hadoop /user/log/history]

$ sudo -u hdfs hadoop fs -mkdir /tmp OR [hadoop fs -mkdir /tmp]
$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp OR [hadoop fs -chmod -R 1777 /tmp]

STEP-4

Format HDFS

If you have created separate user for each daemon

$ sudo -u hdfs bin/hdfs namenode -format

else

$ bin/hdfs namenode -format

STEP-5

Start HDFS and YARN services

If you have created separate user for each deamon

$ sudo -u hdfs sbin/start-dfs.sh
$ sudo -u yarn sbin/start-yarn.sh

else

$ sbin/start-dfs.sh
$ sbin/start-yarn.sh

Utilities:

  • $HADOOP_HOME/bin/hadoop :>>> For basic hadoop operations
  • $HADOOP_HOME/bin/yarn :>>> For YARN related operations
  • $HADOOP_HOME/bin/mapred :>>> For MapReduce realted operations
  • $HADOOP_HOME/bin/hdfs :>>> For HDFS related operations

Demoen Utilities:

  • $HADOOP_HOME/sbin/start-yarn.sh;stop-yarn.sh
  • $HADOOP_HOME/sbin/start-dfs.sh;stop-dfs.sh
  • $HADOOP_HOME/sbin/start-all.sh;stop-all.sh
  • $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver

STEP-6

Check installation using jps

$ jps
20803 NameNode
22056 JobHistoryServer
22124 WebAppProxyServer
7926 Main
21817 NodeManager
21560 ResourceManager
8018 RemoteMavenServer
21373 SecondaryNameNode
21049 DataNode
25651 ElasticSearch
28730 Jps

If these services are not up, check the logs in logs directory to identify the issue.

Web interfaces

What is on what

Master Node:
 - NameNode
 - ResousrceManager
 - JobHistoryServer

Slave Node:
 - NodeManager
 - DataNode
 - WebAppProxyServer

19 comments

Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating Big data training

Google brain is working in the Big data platform managed service to make it a huge success for the world. We hope that society will soon use AI devices at a reasonable cost.