Hadoop Word Count on FutureSystems

Word Count program is a simple example of running MapReduce implementation Hadoop software. This video lesson explains basics of MapReduce functions with a Word Count example, and describes Hadoop Distributed File System (HDFS). It also contains a tutorial of the Word Count program on OpenStack India FutureSystems. For more information, please check out the documentation: cloudmesh.github.io/introduction_to_cloud_computing/class/lesson/cluster/wordcount.html

hadoopwordcountword countfuturesystemsopenstackmapreduce
1.0x

Hadoop Word Count on FutureSystems

Created 3 years ago

Duration 0:32:10
lesson view count 36
Word Count program is a simple example of running MapReduce implementation Hadoop software. This video lesson explains basics of MapReduce functions with a Word Count example, and describes Hadoop Distributed File System (HDFS). It also contains a tutorial of the Word Count program on OpenStack India FutureSystems. For more information, please check out the documentation: cloudmesh.github.io/introduction_to_cloud_computing/class/lesson/cluster/wordcount.html
Select the file type you wish to download
Slide Content
  1. Word Count using Hadoop

    Slide 1 - Word Count using Hadoop

    • Simple program which counts how often words occur.
    • Mapper() splits files and lines
    • Reducer() merges key=value
    • Hadoop Distributed File System (HDFS)
    • Mapper #2
    • Mapper 1And 1Reducer 1… (skip) …Enjoy 1
    • Mapper #1
    • Reducer
    • Contents in Input Files
    • Welcome Hadoop Enjoy Hadoop (welcome.txt)
    • Mapper and Reducer will be used Enjoy MapReduce (welcome2.txt)
    • Welcome 1
    • Hadoop 1
    • Enjoy 1
    • Hadoop 1
    • Hadoop 2
    • Enjoy 2
    • Welcome 1
    • … (skip) …
  2. WordCount.java

    Slide 2 - WordCount.java

    • Welcome Hadoop Enjoy Hadoop
    • map() function per chunk (file)
    • welcome.txt
    • public void map(Object key, Text value, Context context
    • ) throws IOException, InterruptedException {
    • StringTokenizer itr = new StringTokenizer(value.toString());
    • while (itr.hasMoreTokens()) {
    • word.set(itr.nextToken());
    • context.write(word, one);
    • }
    • Welcome 1Hadoop 1Enjoy 1Hadoop 1
  3. WordCount.java

    Slide 3 - WordCount.java

    • combiner() function merges same keys
    • job.setCombinerClass(IntSumReducer.class);
    • Welcome 1Hadoop 1Enjoy 1Hadoop 1
    • Welcome 1Hadoop 2Enjoy 1
  4. WordCount.java

    Slide 4 - WordCount.java

    • reduce() function sums values in the key
    • public void reduce(Text key, Iterable<IntWritable> values,
    • Context context
    • ) throws IOException, InterruptedException {
    • int sum = 0;
    • for (IntWritable val : values) {
    • sum += val.get();
    • }
    • result.set(sum);
    • context.write(key, result);
    • Welcome 1Hadoop 2Enjoy 1
    • Mapper 1
    • and 1
    • Reducer 1
    • will 1
    • be 1
    • used 1
    • Enjoy 1
    • MapReduce 1
    • Mapper 1
    • and 1
    • Reducer 1
    • will 1
    • be 1
    • used 1
    • Enjoy 2
    • MapReduce 1
    • Welcome 1
    • Hadoop 2
  5. Hadoop Distributed File System (HDFS)

    Slide 5 - Hadoop Distributed File System (HDFS)

    • Data Replication For
    • Fault Tolerance
    • Data Protection
    • Compute Performance
    • Name node
    • Data node N
    • Input File
    • Chunk 1
    • Data node 1
    • Chunk 2
    • Chunk 3
    • Chunk 1
    • Chunk 3
    • Chunk 2
    • Chunk 3
    • Data node 2
    • Chunk 2
    • Chunk 1
  6. Word Count Example

    Slide 6 - Word Count Example