nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API
Date Thu, 15 Dec 2011 12:23:34 GMT
I've looked into it again. This is not going to work well when we stay in 
0.20.x. Holding on to 0.20x means doing migration partially now and again just 
before upgrading to 0.22+. This is a _lot_ of extra work!

I strongly prefer an intermediate upgrade to 0.21 where both API's are 
present.

Does anyone know how i can modify Ivy to use Apache's maven repo for the 
Hadoop dependencies? It keeps trying to load it from maven central where the 
0.21 pom is not present.

On Thursday 15 December 2011 13:13:45 Markus Jelsma wrote:
> hmm, i don't see how i can use the old mapred MapOutputFormat API with the
> new Job API. job.setOutputFormatClass(MapFileOutputFormat.class) expects
> an the mapreduce.lib.output.MapFileOutputFormat class and won't accept the
> old API.
> 
> setOutputFormatClass(java.lang.Class<? extends
> org.apache.hadoop.mapreduce.OutputFormat>) in
> org.apache.hadoop.mapreduce.Job cannot be applied to
> (java.lang.Class<org.apache.hadoop.mapred.MapFileOutputFormat>)
> 
> In short, i don't know how i can migrate jobs to the new API on 0.20.x
> without having MapFileOutputFormat present in the new API. Trying to set
> to old mapoutputformat
> 
> On Thursday 15 December 2011 08:55:38 Andrzej Bialecki wrote:
> > On 14/12/2011 19:14, Markus Jelsma wrote:
> > > Yes, the goal is to upgade to 0.22 or higher. The problem is that 0.22
> > > doesn't have the old mapred API so we can only upgrade to 0.22 is all
> > > jobs are ported.
> > > 
> > > I thought the entire mapred package was deprecated but it seems that
> > > class is not deprecated. It feels a bit strange though, this still
> > > means that if we port all jobs to the new API, we still have to move
> > > all imports for this class from mapred to mapreduce before we can
> > > compile with 0.22.
> > > 
> > > Ah well, it better than nothing.
> > 
> > IMHO upgrading to 0.21 as an interim solution is not helpful, it only
> > creates more work - as you noticed yourself 0.21 is a strange animal.
> > 
> > As I mentioned before, the API changes between 0.20 and 0.22 are such
> > that in most cases rote replacement is enough.
> > 
> > Also, we can always create a branch to do this upgrade, and then merge
> > it with trunk when it's ready.

-- 
Markus Jelsma - CTO - Openindex

Mime
View raw message