Hadoop

Apache Hadoop is a framework for distributed storage and processing of big data using the MapReduce programming model.

Key Components

  • HDFS (Hadoop Distributed File System)
  • YARN (Resource Management)
  • MapReduce

Best Practices

  • Cluster Planning
  • Data Organization
  • Job Configuration

Example

// MapReduce Word Count Example public class WordCount { public static class Map extends Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } }

Important Considerations