19 Nisan 2015 Pazar

Hadoop Merhaba Dünya


Hello, in this tutorial we are going to create a Hadoop application. For project,i use central bank's exchange data. It is started from 1950's. You cant get here.

Requirements

Maven
Eclipse IDE

First, we create "Maven Project". It is Maven project because we don't have to deal with dependencies. After that, we edit "pom.xml" file of project. I use Hadoop 2.6 so i added libraries. If you use different version, you can change version number. Added the below code to "pom.xml":

<dependency>
  <groupid>org.apache.hadoop</groupid>
  <artifactid>hadoop-common</artifactid>
  <version>2.6.0</version>
</dependency>
<dependency>
  <groupid>org.apache.hadoop</groupid>
  <artifactid>hadoop-yarn</artifactid>
  <version>2.6.0</version>
</dependency>
<dependency>
  <groupid>org.apache.hadoop</groupid>
  <artifactid>hadoop-mapreduce-client-shuffle</artifactid>
  <version>2.6.0</version>
</dependency>
<dependency>
  <groupid>org.apache.hadoop</groupid>
  <artifactid>hadoop-hdfs</artifactid>
  <version>2.6.0</version>
</dependency>

After that, we can create our classes. First class is driver class. Driver class is the main class of our application. Then we create Mapper and Reducer classes.

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class TCMB {

  public static void main(String[] args) throws Exception {
    if (args.length != 2) {
      System.err.println("Kullanım: TCMB <girdi> <çıktı>");
      System.exit(-1);
    }

    
    Job job=Job.getInstance();
    job.setJarByClass(TCMB.class);
    job.setJobName("TCMB");

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
    job.setMapperClass(TCMBMapper.class);
    job.setReducerClass(TCMBReducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(DoubleWritable.class);
    
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Second class is Mapper class. Mapper class is definer of the Hadoop application. For example, sixth and tenth characters of the text file is year information.

import java.io.IOException;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class TCMBMapper extends Mapper<LongWritable, Text, Text, DoubleWritable> {

 @Override
 public void map(LongWritable key, Text value, Context context)
   throws IOException, InterruptedException {

  String line = value.toString();
  String year = line.substring(6, 10);
  String tempGbpA;
  Double gbpA;
        
  tempGbpA=line.substring(13, 25);
  gbpA = Double.parseDouble(tempGbpA);

  context.write(new Text(year), new DoubleWritable(gbpA));

 }
}

Our last class is Reducer class. Reducer is decision maker of our application. We find maximum value of the each year.

import java.io.IOException;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class TCMBReducer
  extends Reducer<text doublewritable="" text=""> {
  
  @Override
  public void reduce(Text key, Iterable<doublewritable> values,
      Context context)
      throws IOException, InterruptedException {
    
    Double maxValue = Double.MIN_VALUE;
    for (DoubleWritable value : values) {
      maxValue = Math.max(maxValue, value.get());
    }
    context.write(key, new DoubleWritable(maxValue));
  }
}

To run this application from Eclipse, we need to edit Run Configuration. We need to add input and output arguments. After that, you can run application.
If you have questions, don't hesitate to ask.

Hiç yorum yok:

Yorum Gönder