hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From EdwardKing <zhan...@neusoft.com>
Subject question about reduce method
Date Mon, 17 Feb 2014 09:14:11 GMT
Hello every,
   I am a newbie to hadoop2.2.0, I puzzle with reduce method ,I have two text file,sales.txt
and account.txt,like follows:
sales.txt
001 35.99 2012-03-15
002 12.49 2004-07-02
004 13.42 2005-12-20
003 499.99 2010-12-20
001 78.95 2012-04-02
002 21.99 2006-11-30
002 93.45 2008-09-10
001 9.99 2012-05-17

account.txt
001 John Allen Standard 2012-03-15
002 Abigail Smith Premium 2004-07-13
003 April Stevens Standard 2010-12-20
004 Nasser Hafez Premium 2001-04-23

ReduceJoin.java is follows:
import java.io.* ;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs ;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat ;

public class ReduceJoin
{
    
    public static class SalesRecordMapper
    extends Mapper<Object, Text, Text, Text>{
        
        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException
        {
            String record = value.toString() ;
            String[] parts = record.split("\t") ;
            
            context.write(new Text(parts[0]), new Text("sales\t"+parts[1])) ;
        }
    }
    
    public static class AccountRecordMapper
    extends Mapper<Object, Text, Text, Text>{
        
        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException
        {
            String record = value.toString() ;
            String[] parts = record.split("\t") ;
            
            context.write(new Text(parts[0]), new Text("accounts\t"+parts[1])) ;
        }
    }
    
    public static class ReduceJoinReducer
    extends Reducer<Text, Text, Text, Text>
    {
        
        public void reduce(Text key, Iterable<Text> values,
            Context context
            ) throws IOException, InterruptedException
            {
                String name = "" ;
            double total = 0.0 ;
            int count = 0 ;
            
            for(Text t: values)
            {
                String parts[] = t.toString().split("\t") ;
                
                if (parts[0].equals("sales"))
                {
                    count++ ;
                    total+= Float.parseFloat(parts[1]) ;
                }
                else if (parts[0].equals("accounts"))
                {
                    name = parts[1] ;
                }
            }
            
            String str = String.format("%d\t%f", count, total) ;
            context.write(new Text(name), new Text(str)) ;
        }
    }
    
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = new Job(conf, "Reduce-side join");
        job.setJarByClass(ReduceJoin.class);
        job.setReducerClass(ReduceJoinReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, SalesRecordMapper.class)
;
        MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, AccountRecordMapper.class)
;
        //        FileOutputFormat.setOutputPath(job, new Path(args[2]));
        Path outputPath = new Path(args[2]);
        FileOutputFormat.setOutputPath(job, outputPath);
        outputPath.getFileSystem(conf).delete(outputPath);
        
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

I create join.jar and run it 
$ hadoop jar join.jarReduceJoin sales accounts outputs
$ hadoop fs -cat /user/garry/outputs/part-r-00000
John Allen 3 124.929998
Abigail Smith 3 127.929996
April Stevens 1 499.989990
Nasser Hafez 1 13.420000

I know map method put these text file into map,like follows,right?
<001, 35.99>
<001, 35.99>
<002, 12.49>
<004, 13.42>
<003, 499.99>
<001 ,78.95> 
<002, 21.99> 
<002, 93.45> 
<001, 9.99> 
<001, John Allen>
<002, Abigail Smith>
<003, April Stevens>
<004, Nasser Hafez>

But I don't under stand reduce method,how it produce following result,any one counld give
the detail steps to produce following result?  Thanks in advance
John Allen 3 124.929998
Abigail Smith 3 127.929996
April Stevens 1 499.989990
Nasser Hafez 1 13.420000



---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)

is intended only for the use of the intended recipient and may be confidential and/or privileged
of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication
is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or
copying 
is strictly prohibited, and may be unlawful.If you have received this communication in error,please

immediately notify the sender by return e-mail, and delete the original message and all copies
from 
your system. Thank you. 
---------------------------------------------------------------------------------------------------
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message