hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: Calculations of the InputSplits
Date Sun, 25 Sep 2011 20:14:26 GMT
The reason it isn't done in JobTracker is to not run user-code within the framework - InputSplit.getSplits()
is user code.

In MRv1 is was highly complicated - in MRv2 it's trivial to do it the MR ApplicationMaster,
I'll get to it some wknd soon - patches welcome! :)


On Sep 25, 2011, at 9:42 AM, Praveen Sripati wrote:

> Hi,
> There was a query in StackOverflow regarding high CPU on the client after
> submitting jobs (upto 200 jobs in batch and 150MB jar file size).
> Calculation of the InputSplit may be one of the reason for the high CPU on
> the client. Why should the calculation of the InputSplit happen on the
> client? JobTracker is a high-end machine, can't the calculation happen on
> the JobTracker?
> http://stackoverflow.com/questions/7546064/hadoop-high-cpu-load-on-client-side-after-committing-jobs
> Thanks,
> Praveen

View raw message