hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Templeton <dan...@cloudera.com.INVALID>
Subject Re: detecting oversized bucket in mapreduce
Date Tue, 27 Nov 2018 17:56:43 GMT
There are no per-key metrics provided by MapReduce, but you should be 
able to run your job with an identity reducer to see what the bucket 
sizes were.

If you're talking about doing it on the fly, there's no way to do that 
today.  The job is submitted with a fixed number of reducers, which also 
fixes the number of buckets.  YARN supports adding resources to an 
existing job, e.g. adding more reducers, but MapReduce doesn't make use 
of those capabilities.

Daniel

On 11/26/18 9:10 PM, Tianxiang Li wrote:
> Dear Hadoop community,
>
> I'm new to the Hadoop MapReduce code, and I'd like to know how I can get the number of
records under a specific key value after the map process. I'd like to detect oversized buckets
and perform further key division to split the records.
>
> Thanks,
> Peter
>


---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org


Mime
View raw message