nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <>
Subject Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API
Date Thu, 15 Dec 2011 12:13:45 GMT
hmm, i don't see how i can use the old mapred MapOutputFormat API with the new 
Job API. job.setOutputFormatClass(MapFileOutputFormat.class) expects an the 
mapreduce.lib.output.MapFileOutputFormat class and won't accept the old API.

setOutputFormatClass(java.lang.Class<? extends 
org.apache.hadoop.mapreduce.OutputFormat>) in org.apache.hadoop.mapreduce.Job 
cannot be applied to 

In short, i don't know how i can migrate jobs to the new API on 0.20.x without 
having MapFileOutputFormat present in the new API. Trying to set to old 

On Thursday 15 December 2011 08:55:38 Andrzej Bialecki wrote:
> On 14/12/2011 19:14, Markus Jelsma wrote:
> > Yes, the goal is to upgade to 0.22 or higher. The problem is that 0.22
> > doesn't have the old mapred API so we can only upgrade to 0.22 is all
> > jobs are ported.
> > 
> > I thought the entire mapred package was deprecated but it seems that
> > class is not deprecated. It feels a bit strange though, this still means
> > that if we port all jobs to the new API, we still have to move all
> > imports for this class from mapred to mapreduce before we can compile
> > with 0.22.
> > 
> > Ah well, it better than nothing.
> IMHO upgrading to 0.21 as an interim solution is not helpful, it only
> creates more work - as you noticed yourself 0.21 is a strange animal.
> As I mentioned before, the API changes between 0.20 and 0.22 are such
> that in most cases rote replacement is enough.
> Also, we can always create a branch to do this upgrade, and then merge
> it with trunk when it's ready.

Markus Jelsma - CTO - Openindex

View raw message