Front page Diff Backup Reload List of pages Search Recent changes Help

Hello World from OSS Silicon Valley

HowToUse/Hadoop/3.0

Prerequisite
Install&Setup
HowToUse
Author

_ Prerequisite

CentOS installation (You can refer HowToUse/CentOS/6.5)
Java installation (You can refer HowToUse/Java/1.8)

_ Install&Setup

Step.1: Create user account for Hadoop.

$ sudo useradd hadoop
$ sudo passwd hadoop

Step.2: Download source files from here and unarchive the file.

$ wget http://ftp.meisei-u.ac.jp/mirror/apache/dist/hadoop/common/hadoop-3.0.0-alpha1/hadoop-3.0.0-alpha1.tar.gz
$ tar xzvf hadoop-3.0.0-alpha1.tar.gz
$ mv hadoop-3.0.0-alpha1 /usr/local
$ chown -R hadoop:hadoop /usr/local/hadoop-3.0.0-alpha1

Step.3: Setup environmental variables in .bashrc.

$ vi ~/.bashrc

export JAVA_HOME=/usr/lib/jvm/java-1.8.0
export HADOOP_INSTALL=/usr/share/hadoop-3.0.0-alpha1
export PATH=$HADOOP_INSTALL/bin:$JAVA_HOME/bin:$PATH
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar

↑

_ HowToUse

↑

_ Run Standalone mode

Step.1: Prepare input files.

$ mkdir input
$ vi input/file01

Hello world!

Step.2: Execute hadoop process using sample jar file.

$ hadoop jar /usr/share/hadoop-3.0.0-alpha1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha1.jar grep input output 'Hello+'

↑

_ Run Pseudo-distributed mode

Step.1: Edit conf files.

$ vi $HADOOP_INSTALL/etc/hadoop/core-site.xml

<configuration>
   <property>
       <name>fs.defaultFS</name>
       <value>hdfs://localhost:9000</value>
   </property>
</configuration>

$ vi $HADOOP_INSTALL/etc/hadoop/hdfs-site.xml

<configuration>
   <property>
       <name>dfs.replication</name>
       <value>1</value>
   </property>
</configuration>

You can refer sample from here

Step.2: Setup key and check if you can access localhost without password.

$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 600 ~/.ssh/authorized_keys

$ ssh localhost

Step.3: Format file systems for hdfs.

$ hdfs namenode -format

Then you can find new files below.

$ ls /tmp/hadoop-hadoop/dfs/name/

Step.4: Start NameNode daemon and DataNode daemon.

$ $HADOOP_INSTALL/sbin/start-dfs.s

Step.5: Access web interface from http://localhost:9870/

↑

_ Create Sample Code

Step.1: Create Hadoop code.

$ vi WordCount.java

You can see sample code from here

Step.2: Compile sample code and create jar file.

$ hadoop com.sun.tools.javac.Main WordCount.java
$ jar cf wc.jar WordCount*.class

Step.3: Prepare input files for WordCount.

$ mkdir input
$ vi input/file01
$ vi input/file02

You can see sample code from here

Step.4: Execute hadoop job for WordCount.

$ hadoop jar wc.jar WordCount ../input/ ../output

Then you will see output in output directory as below.

[hadoop@localhost work]$ ls output/
_SUCCESS  part-r-00000

$ vi output/part-r-00000

Bye     1
Goodbye 1
Hadoop  2
Hello   2
World   2

↑

_ Author

S.Yatsuzuka

Last-modified: 2016-11-06 (Sun) 07:51:25 (3029d)