Saturday, 15 November 2014

執行 Hadoop MapReduce 字數統計 (Word Count)範例程式

Preparation

要執行MapReduce程式之前，先安裝好Hadoop並將服務啟動，接著在準備好要進行字數統計的資料，可以自行新增或自己上網找，這裡我會提供三個文件的載點，請下載Plain Text UTF-8。

Hadoop 2.2.0 Environment
The Outline of Science, Vol. 1 (of 4) by J. Arthur Thomson
The Notebooks of Leonardo Da Vinci
Ulysses by James Joyce

Step by Step

首先，將剛剛下載來的三份文件放在 input 這個目錄底下


$mkdir input_txt

$mv pg20417.txt pg4300.txt pg5000.txt input_txt

接著把整個 input 目錄放到Hadoop File System (HDFS)


$hadoop fs -copyFromLocal input_txt user/hduser/input_txt

$hadoop fs -ls

-rw-r--r--   1 hduser supergroup     710771 2014-10-03 16:13 input_txt/pg20417.txt

-rw-r--r--   1 hduser supergroup    1573150 2014-10-03 16:13 input_txt/pg4300.txt

-rw-r--r--   1 hduser supergroup    1423803 2014-10-03 16:13 input_txt/pg5000.txt

最後一步就是執行Word Count


$hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /user/hduser/input_txt /user/hduser/output_txt

output_txt 可以不用事先建立，在完成了這個MapReduce Job 之後Hadoop會自己建立，可以到Hadoop 的 WebUI 確認輸出的結果

0 comments:

Post a Comment