nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ferdy Galema <ferdy.gal...@kalooga.com>
Subject Re: hadoop.job.history.user.location in nutch-default with CDH rendering job history useless
Date Tue, 14 Aug 2012 08:31:00 GMT
FYI I've created a Jira for followup discussion.
https://issues.apache.org/jira/browse/NUTCH-1452

On Tue, Aug 7, 2012 at 11:21 AM, Ferdy Galema <ferdy.galema@kalooga.com>wrote:

> Hi,
>
> There still is a property in nutch-default
> 'hadoop.job.history.user.location' that redirects the creation of history
> files from job output locations to a custom location. I noticed that the
> current value does not work well with CDH, because ${hadoop.log.dir} is not
> defined. This actually causes the entire job history in the jobtracker to
> show empty info. (With 'incomplete' job status).
>
> Changing the value to /user/myname/history does work for example. However
> I have done some more testing and it seems that this property can be set to
> 'none', because the job history is ALSO stored in the central jobtracker
> location anyway. The 'hadoop.job.history.user.location' property specifies
> an extra location. But if it is set to an invalid value, it causes the
> central history location to NOT store it. Please see for more details:
> http://hadoop.apache.org/common/docs/r1.0.3/cluster_setup.html
>
> Setting this value to 'none' keeps the central history but prevents the
> job to write history in the job output location. If a user wants to have an
> extra copy of the history files, nothing prevents him/her from specifying
> another value in nutch-site for example. Another option is to set it to
> 'history' which does work with CDH. (This writes all logs to 'history' in
> the user directory in the configured filesystem, usually dfs). The final
> option is to simply remove this value and not meddle with hadoop properties
> at all. But that actually requires all jobs to correctly ignore these
> files. I am not up to date how well this currently works with Nutch jobs.
> This question is most relevant for trunk, since trunk heavily relies on the
> filesystem for jobs.
>
> What do you think? It would be great if anyone could do some testing with
> trunk and possible another Hadoop distro. (I.e. the official 1.0.3). Then
> we have some more input to decide what the best option is:
> A) Set property to 'none'
> B) Set property to 'history'
> C) Remove property, see what happens, possibly fix jobs
> D) ?
>
> Ferdy.
>
>
>
>

Mime
View raw message