Hello, in this tutorial we are going to create a Hadoop application. For project,i use central bank's exchange data. It is started from 1950's. You cant get here.
Eclipse IDE
First, we create "Maven Project". It is Maven project because we don't have to deal with dependencies. After that, we edit "pom.xml" file of project. I use Hadoop 2.6 so i added libraries. If you use different version, you can change version number. Added the below code to "pom.xml":
After that, we can create our classes. First class is driver class. Driver class is the main class of our application. Then we create Mapper and Reducer classes.
Second class is Mapper class. Mapper class is definer of the Hadoop application. For example, sixth and tenth characters of the text file is year information.
Our last class is Reducer class. Reducer is decision maker of our application. We find maximum value of the each year.
To run this application from Eclipse, we need to edit Run Configuration. We need to add input and output arguments. After that, you can run application.
If you have questions, don't hesitate to ask.
Requirements
MavenEclipse IDE
First, we create "Maven Project". It is Maven project because we don't have to deal with dependencies. After that, we edit "pom.xml" file of project. I use Hadoop 2.6 so i added libraries. If you use different version, you can change version number. Added the below code to "pom.xml":
<dependency> <groupid>org.apache.hadoop</groupid> <artifactid>hadoop-common</artifactid> <version>2.6.0</version> </dependency> <dependency> <groupid>org.apache.hadoop</groupid> <artifactid>hadoop-yarn</artifactid> <version>2.6.0</version> </dependency> <dependency> <groupid>org.apache.hadoop</groupid> <artifactid>hadoop-mapreduce-client-shuffle</artifactid> <version>2.6.0</version> </dependency> <dependency> <groupid>org.apache.hadoop</groupid> <artifactid>hadoop-hdfs</artifactid> <version>2.6.0</version> </dependency>
After that, we can create our classes. First class is driver class. Driver class is the main class of our application. Then we create Mapper and Reducer classes.
import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class TCMB { public static void main(String[] args) throws Exception { if (args.length != 2) { System.err.println("Kullanım: TCMB <girdi> <çıktı>"); System.exit(-1); } Job job=Job.getInstance(); job.setJarByClass(TCMB.class); job.setJobName("TCMB"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(TCMBMapper.class); job.setReducerClass(TCMBReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(DoubleWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
Second class is Mapper class. Mapper class is definer of the Hadoop application. For example, sixth and tenth characters of the text file is year information.
import java.io.IOException; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class TCMBMapper extends Mapper<LongWritable, Text, Text, DoubleWritable> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String year = line.substring(6, 10); String tempGbpA; Double gbpA; tempGbpA=line.substring(13, 25); gbpA = Double.parseDouble(tempGbpA); context.write(new Text(year), new DoubleWritable(gbpA)); } }
Our last class is Reducer class. Reducer is decision maker of our application. We find maximum value of the each year.
import java.io.IOException; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class TCMBReducer extends Reducer<text doublewritable="" text=""> { @Override public void reduce(Text key, Iterable<doublewritable> values, Context context) throws IOException, InterruptedException { Double maxValue = Double.MIN_VALUE; for (DoubleWritable value : values) { maxValue = Math.max(maxValue, value.get()); } context.write(key, new DoubleWritable(maxValue)); } }
To run this application from Eclipse, we need to edit Run Configuration. We need to add input and output arguments. After that, you can run application.
If you have questions, don't hesitate to ask.