hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Automatically Documenting Apache Hadoop Configuration
Date Mon, 05 Dec 2011 17:54:44 GMT


On 05-Dec-2011, at 10:14 PM, Praveen Sripati wrote:

> Hi,
> Recently there was a query about the Hadoop framework being tolerant for
> map/reduce task failure towards the job completion. And the solution was to
> set the 'mapreduce.map.failures.maxpercent` and
> 'mapreduce.reduce.failures.maxpercent' properties. Although this feature
> was introduced couple of years back, it was not documented. Had similar
> experience with 0.23 release also.

I do not know if we recommend using config strings directly when there's an API in Job/JobConf
supporting setting the same thing. Just saying - that there was javadoc already available
on this. But of course, it would be better if the tutorial covered this too. Doc-patches welcome!

> It would be really good for Hadoop adoption to automatically dig and
> document all the existing configurable properties in Hadoop and also to
> identify newly added properties in a particular release during the build
> processes. Documentation would also lead to fewer queries in the forums.
> Cloudera has done something similar [1], though it's not 100% accurate, it
> would definitely help to some extent.

I'm +1 for this. We do request and consistently add entries to *-default.xml files if we find
them undocumented today. I think we should also enforce it at the review level, so that patches
do not go in undocumented -- at minimum the configuration tweaks at least.
View raw message