hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject What is the reason for putting the output of one mapper task into one file ?
Date Thu, 17 Jun 2010 02:53:31 GMT
Hi all,

I check the source code of Mapper Task, it seems that the output of
one mapper task is one data file and one index file. And reducer task
will fetch part of the output of mapper.
I am wondering why not putting the output of mapper into n files (n is
the reducer number), since mapper task knows the Partitioner. and the
logic will be much easier. Is there any performance consideration for
putting the output into one file ? Thanks.

Best Regards

Jeff Zhang

View raw message