hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akira AJISAKA <ajisa...@oss.nttdata.co.jp>
Subject Re: Update interval of default counters
Date Thu, 17 Apr 2014 05:01:22 GMT
I'm thinking the reason for hard-coding is to protect Hadoop cluster
from high network traffic. If the value is too small, there are
too many network traffic between Map/Reduce tasks and MRAppMaster.

Please see https://issues.apache.org/jira/browse/MAPREDUCE-4381 also.

That's why you need to be very careful if you really want to change
the value.

The source code is at
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java

(line 532-533)

   /** The number of milliseconds between progress reports. */
   public static final int PROGRESS_INTERVAL = 3000;

Regards,
Akira

(2014/04/16 22:17), Dharmesh Kakadia wrote:
> Hi Akira,
>
> Thanks fir the quick reply.
> Any particular reason for hard-coding it? Is there a workaround? I want to
> be able to get the counters as fine as possible. Also can you point me to
> the relevant source code. I am willing to take the issue and contribute if
> required.
>
> Thanks,
> Dharmesh
>
>
> On Wed, Apr 16, 2014 at 3:14 PM, Akira AJISAKA
> <ajisakaa@oss.nttdata.co.jp>wrote:
>
>> Moved mapreduce-dev@ to Bcc.
>>
>> Hi Dharmesh,
>>
>> The parameter is to set the interval of polling the progress
>> of the MRAppMaster, not the Map/Reduce tasks. The tasks send
>> the progress (includes the counter information) to MRAppMaster
>> every 3000 milliseconds, which is hard-coded.
>>
>> That's why a sudden big change in counter values happens
>> even if the parameter is set to a small value.
>>
>> Regards,
>> Akira
>>
>>
>> (2014/04/16 15:42), Dharmesh Kakadia wrote:
>>
>>> Hi Akira,
>>>
>>> Thanks for the reply, but as I understand this is the interval of console
>>> counter printing. What I am trying to get
>>>
>>> while(!job.isComplete()){
>>>    getcounters() and do some processing on that.
>>> }
>>>
>>> Now this is running fine, but the status I get the same counter values
>>> repeatedly and then suddenly a big change in counter values.
>>> For example, getcounters for REDUCE_INPUT_RECORDS returns values like
>>>
>>> 0
>>> 0
>>> ..
>>> 0
>>> 280
>>> 280
>>> ...
>>> 280
>>> 516
>>> 516
>>> ...
>>> 516
>>>
>>> etc.
>>>
>>> I want to get more finer values, instead of directly jumping from 280 to
>>> 516.
>>> Did that make sense? mapreduce.client.progressmonitor.pollinterval does
>>> not
>>> seem to effect it. Any workaround ?
>>>
>>> Thanks,
>>> Dharmesh
>>>
>>>
>>>
>>>
>>> On Tue, Apr 15, 2014 at 7:51 PM, Akira AJISAKA
>>> <ajisakaa@oss.nttdata.co.jp>wrote:
>>>
>>>   Moved to user@hadoop.apache.org.
>>>>
>>>> You can configure the interval by setting
>>>> "mapreduce.client.progressmonitor.pollinterval" parameter.
>>>> The default value is 1000 ms.
>>>>
>>>> For more details, please see http://hadoop.apache.org/docs/
>>>> stable/hadoop-mapreduce-client/hadoop-mapreduce-
>>>> client-core/mapred-default.xml.
>>>>
>>>> Regards,
>>>> Akira
>>>>
>>>>
>>>> (2014/04/15 15:29), Dharmesh Kakadia wrote:
>>>>
>>>>   Hi,
>>>>>
>>>>> What is the update interval of inbuilt framework counters? Is that
>>>>> configurable?
>>>>> I am trying to collect very fine grained information about the job
>>>>> execution and using counters for that. It would be great if someone can
>>>>> point me to documentation/code for it. Thanks in advance.
>>>>>
>>>>> Thanks,
>>>>> Dharmesh
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


Mime
View raw message