hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Automatically Documenting Apache Hadoop Configuration
Date Mon, 05 Dec 2011 19:22:21 GMT
I've seen Oozie do that same break-up of config param names and boy, its difficult to grep
in such a code base when troubleshooting.

OTOH, we at least get a sane prefix for relevant config names (hope we do?)

On 06-Dec-2011, at 12:44 AM, Robert Evans wrote:

> From my work on yarn trying to document the configs there and to standardize them, writing
anything that is going to automatically detect config values through static analysis is going
to be very difficult.  This is because most of the configs in yarn are now built up using
static string concatenation.
> public static String BASE = "yarn.base.";
> public static String CONF = BASE+"config";
> I am not sure that there is a good way around this short of using a full java parser
to trace out all method calls, and try to resolve the parameters.  I know this is possible,
just not that simple to do.
> I am +1 for anything that will clean up configs and improve the documentation of them.
 Even if we have to rewire or rewrite a lot of the Configuration class to make things work
> --Bobby Evans
> On 12/5/11 11:54 AM, "Harsh J" <harsh@cloudera.com> wrote:
> Praveen,
> (Inline.)
> On 05-Dec-2011, at 10:14 PM, Praveen Sripati wrote:
>> Hi,
>> Recently there was a query about the Hadoop framework being tolerant for
>> map/reduce task failure towards the job completion. And the solution was to
>> set the 'mapreduce.map.failures.maxpercent` and
>> 'mapreduce.reduce.failures.maxpercent' properties. Although this feature
>> was introduced couple of years back, it was not documented. Had similar
>> experience with 0.23 release also.
> I do not know if we recommend using config strings directly when there's an API in Job/JobConf
supporting setting the same thing. Just saying - that there was javadoc already available
on this. But of course, it would be better if the tutorial covered this too. Doc-patches welcome!
>> It would be really good for Hadoop adoption to automatically dig and
>> document all the existing configurable properties in Hadoop and also to
>> identify newly added properties in a particular release during the build
>> processes. Documentation would also lead to fewer queries in the forums.
>> Cloudera has done something similar [1], though it's not 100% accurate, it
>> would definitely help to some extent.
> I'm +1 for this. We do request and consistently add entries to *-default.xml files if
we find them undocumented today. I think we should also enforce it at the review level, so
that patches do not go in undocumented -- at minimum the configuration tweaks at least.

View raw message