Hello World from OSS Silicon Valley
HowToUse/Hadoop/3.0
_ Prerequisite
- CentOS installation (You can refer HowToUse/CentOS/6.5)
- Java installation (You can refer HowToUse/Java/1.8)
_ Install&Setup
- Step.1
- Create user account for Hadoop.
$ sudo useradd hadoop $ sudo passwd hadoop
- Step.2
- Download source files from here and unarchive the file.
$ wget http://ftp.meisei-u.ac.jp/mirror/apache/dist/hadoop/common/hadoop-3.0.0-alpha1/hadoop-3.0.0-alpha1.tar.gz $ tar xzvf hadoop-3.0.0-alpha1.tar.gz $ mv hadoop-3.0.0-alpha1 /usr/local $ chown -R hadoop:hadoop /usr/local/hadoop-3.0.0-alpha1
- Step.3
- Setup environmental variables in .bashrc.
$ vi ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-1.8.0 export HADOOP_INSTALL=/usr/share/hadoop-3.0.0-alpha1 export PATH=$HADOOP_INSTALL/bin:$JAVA_HOME/bin:$PATH export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
_ HowToUse
_ Run Standalone mode
- Step.1
- Prepare input files.
$ mkdir input $ vi input/file01
Hello world!
- Step.2
- Execute hadoop process using sample jar file.
$ hadoop jar /usr/share/hadoop-3.0.0-alpha1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha1.jar grep input output 'Hello+'
_ Run Pseudo-distributed mode
- Step.1
- Edit conf files.
$ vi $HADOOP_INSTALL/etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
$ vi $HADOOP_INSTALL/etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
You can refer sample from here
- Step.2
- Setup key and check if you can access localhost without password.
$ ssh-keygen -t rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 600 ~/.ssh/authorized_keys
$ ssh localhost
- Step.3
- Format file systems for hdfs.
$ hdfs namenode -format
Then you can find new files below.
$ ls /tmp/hadoop-hadoop/dfs/name/
- Step.4
- Start NameNode daemon and DataNode daemon.
$ $HADOOP_INSTALL/sbin/start-dfs.s
- Step.5
- Access web interface from http://localhost:9870/
_ Create Sample Code
- Step.1
- Create Hadoop code.
$ vi WordCount.java
You can see sample code from here
- Step.2
- Compile sample code and create jar file.
$ hadoop com.sun.tools.javac.Main WordCount.java $ jar cf wc.jar WordCount*.class
- Step.3
- Prepare input files for WordCount.
$ mkdir input $ vi input/file01 $ vi input/file02
You can see sample code from here
- Step.4
- Execute hadoop job for WordCount.
$ hadoop jar wc.jar WordCount ../input/ ../output
Then you will see output in output directory as below.
[hadoop@localhost work]$ ls output/ _SUCCESS part-r-00000
$ vi output/part-r-00000
Bye 1 Goodbye 1 Hadoop 2 Hello 2 World 2
_ Author
S.Yatsuzuka
Last-modified: 2016-11-06 (Sun) 07:51:25 (3029d)