nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1219) Upgrade all jobs to new MapReduce API
Date Wed, 14 Dec 2011 12:55:30 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169308#comment-13169308
] 

Markus Jelsma commented on NUTCH-1219:
--------------------------------------

Keep in mind that does not work:

{code}
    Configuration conf = getConf();
    Job job = new Job(conf, jobName);
    job.setJarByClass(DomainStatistics.class);
    conf.setInt("domain.statistics.mode", mode);
    conf.setBoolean("mapreduce.fileoutputcommitter.marksuccessfuljobs", false);
{code}

but this does:

{code}
    Configuration conf = getConf();
    conf.setInt("domain.statistics.mode", mode);
    conf.setBoolean("mapreduce.fileoutputcommitter.marksuccessfuljobs", false);
    Job job = new Job(conf, jobName);
    job.setJarByClass(DomainStatistics.class);
{code}

It is easily overlooked with default settings!!
                
> Upgrade all jobs to new MapReduce API
> -------------------------------------
>
>                 Key: NUTCH-1219
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1219
>             Project: Nutch
>          Issue Type: Task
>            Reporter: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.5
>
>
> We should upgrade to the new Hadoop API for Nutch trunk as already has been done for
the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version
that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher,
port the jobs to the new API without having the need for a separate branch to work on.
> To the committers who created/ported jobs in NutchGora, please write down your advice
and experience.
> http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message