hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akira AJISAKA <ajisa...@oss.nttdata.co.jp>
Subject Re: question about reduce method
Date Mon, 17 Feb 2014 18:06:25 GMT
> I know map method put these text file into map,like follows,right?
> <001, 35.99>
> <001, 35.99>
> <002, 12.49>
> <004, 13.42>
> <003, 499.99>
> <001 ,78.95>
> <002, 21.99>
> <002, 93.45>
> <001, 9.99>
> <001, John Allen>
> <002, Abigail Smith>
> <003, April Stevens>
> <004, Nasser Hafez>

Followings outputs are the correct.

<001,sales	35.99>
<002,sales	12.49>
<004,sales	13.42>
<003,sales	499.99>
<001,sales	78.95>
<002,sales	21.99>
<002,sales	93.45>
<001,sales	9.99>
<001,accounts	John Allen>
<002,accounts	Abigail Smith>
<003,accounts	April Stevens>
<004,accounts	Nasser Hafez>

The outputs are grouped and sorted by keys, and reducers process each
groups. The inputs of the reduce method are as follows:

<key: 001,
 values: {sales 35.99, sales 78.95, sales 9.99, accounts John Allen}>
<key: 002,
 values: {sales 12.49, sales 21.99, sales 93.45, accounts Abigail Smith}>
<key: 003,
 values: {sales 499.99, accounts April Stevens}>
<key: 004,
 values: {sales 13.42, accounts Nasser Hafez}>

Regards,
Akira

(2014/02/17 1:14), EdwardKing wrote:
> Hello every,
>     I am a newbie to hadoop2.2.0, I puzzle with reduce method ,I have two text file,sales.txt
and account.txt,like follows:
> sales.txt
> 001 35.99 2012-03-15
> 002 12.49 2004-07-02
> 004 13.42 2005-12-20
> 003 499.99 2010-12-20
> 001 78.95 2012-04-02
> 002 21.99 2006-11-30
> 002 93.45 2008-09-10
> 001 9.99 2012-05-17
> 
> account.txt
> 001 John Allen Standard 2012-03-15
> 002 Abigail Smith Premium 2004-07-13
> 003 April Stevens Standard 2010-12-20
> 004 Nasser Hafez Premium 2001-04-23
> 
> ReduceJoin.java is follows:
> import java.io.* ;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.Mapper;
> import org.apache.hadoop.mapreduce.Reducer;
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
> import org.apache.hadoop.mapreduce.lib.input.MultipleInputs ;
> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat ;
> 
> public class ReduceJoin
> {
>      
>      public static class SalesRecordMapper
>      extends Mapper<Object, Text, Text, Text>{
>          
>          public void map(Object key, Text value, Context context
>          ) throws IOException, InterruptedException
>          {
>              String record = value.toString() ;
>              String[] parts = record.split("\t") ;
>              
>              context.write(new Text(parts[0]), new Text("sales\t"+parts[1])) ;
>          }
>      }
>      
>      public static class AccountRecordMapper
>      extends Mapper<Object, Text, Text, Text>{
>          
>          public void map(Object key, Text value, Context context
>          ) throws IOException, InterruptedException
>          {
>              String record = value.toString() ;
>              String[] parts = record.split("\t") ;
>              
>              context.write(new Text(parts[0]), new Text("accounts\t"+parts[1])) ;
>          }
>      }
>      
>      public static class ReduceJoinReducer
>      extends Reducer<Text, Text, Text, Text>
>      {
>          
>          public void reduce(Text key, Iterable<Text> values,
>              Context context
>              ) throws IOException, InterruptedException
>              {
>                  String name = "" ;
>              double total = 0.0 ;
>              int count = 0 ;
>              
>              for(Text t: values)
>              {
>                  String parts[] = t.toString().split("\t") ;
>                  
>                  if (parts[0].equals("sales"))
>                  {
>                      count++ ;
>                      total+= Float.parseFloat(parts[1]) ;
>                  }
>                  else if (parts[0].equals("accounts"))
>                  {
>                      name = parts[1] ;
>                  }
>              }
>              
>              String str = String.format("%d\t%f", count, total) ;
>              context.write(new Text(name), new Text(str)) ;
>          }
>      }
>      
>      public static void main(String[] args) throws Exception {
>          Configuration conf = new Configuration();
>          Job job = new Job(conf, "Reduce-side join");
>          job.setJarByClass(ReduceJoin.class);
>          job.setReducerClass(ReduceJoinReducer.class);
>          job.setOutputKeyClass(Text.class);
>          job.setOutputValueClass(Text.class);
>          MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, SalesRecordMapper.class)
;
>          MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, AccountRecordMapper.class)
;
>          //        FileOutputFormat.setOutputPath(job, new Path(args[2]));
>          Path outputPath = new Path(args[2]);
>          FileOutputFormat.setOutputPath(job, outputPath);
>          outputPath.getFileSystem(conf).delete(outputPath);
>          
>          System.exit(job.waitForCompletion(true) ? 0 : 1);
>      }
> }
> 
> I create join.jar and run it
> $ hadoop jar join.jarReduceJoin sales accounts outputs
> $ hadoop fs -cat /user/garry/outputs/part-r-00000
> John Allen 3 124.929998
> Abigail Smith 3 127.929996
> April Stevens 1 499.989990
> Nasser Hafez 1 13.420000
> 
> I know map method put these text file into map,like follows,right?
> <001, 35.99>
> <001, 35.99>
> <002, 12.49>
> <004, 13.42>
> <003, 499.99>
> <001 ,78.95>
> <002, 21.99>
> <002, 93.45>
> <001, 9.99>
> <001, John Allen>
> <002, Abigail Smith>
> <003, April Stevens>
> <004, Nasser Hafez>
> 
> But I don't under stand reduce method,how it produce following result,any one counld
give the detail steps to produce following result?  Thanks in advance
> John Allen 3 124.929998
> Abigail Smith 3 127.929996
> April Stevens 1 499.989990
> Nasser Hafez 1 13.420000
> 
> 
> 
> ---------------------------------------------------------------------------------------------------
> Confidentiality Notice: The information contained in this e-mail and any accompanying
attachment(s)
> is intended only for the use of the intended recipient and may be confidential and/or
privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication
is
> not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure
or copying
> is strictly prohibited, and may be unlawful.If you have received this communication in
error,please
> immediately notify the sender by return e-mail, and delete the original message and all
copies from
> your system. Thank you.
> ---------------------------------------------------------------------------------------------------
> 


Mime
View raw message