nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roman Valls (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-634) Patch - Nutch - Hadoop 0.17.1
Date Tue, 22 Jul 2008 22:33:31 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615815#action_12615815
] 

Roman Valls commented on NUTCH-634:
-----------------------------------

As promised, I've tested this patch in production (7-node cluster)... the crawl gets halted
after these exceptions:

java.lang.AbstractMethodError: org.apache.nutch.crawl.PartitionUrlByHost.getPartition(Ljava/lang/Object;Ljava/lang/Object;I)I
	at org.apache.nutch.crawl.Generator$Selector.getPartition(Generator.java:171)
	at org.apache.nutch.crawl.Generator$Selector.getPartition(Generator.java:83)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:464)
	at org.apache.nutch.crawl.Generator$Selector.map(Generator.java:165)
	at org.apache.nutch.crawl.Generator$Selector.map(Generator.java:83)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

java.lang.AbstractMethodError: org.apache.nutch.crawl.PartitionUrlByHost.getPartition(Ljava/lang/Object;Ljava/lang/Object;I)I
	at org.apache.nutch.crawl.Generator$Selector.getPartition(Generator.java:171)
	at org.apache.nutch.crawl.Generator$Selector.getPartition(Generator.java:83)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:464)
	at org.apache.nutch.crawl.Generator$Selector.map(Generator.java:165)
	at org.apache.nutch.crawl.Generator$Selector.map(Generator.java:83)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

java.lang.AbstractMethodError: org.apache.nutch.crawl.PartitionUrlByHost.getPartition(Ljava/lang/Object;Ljava/lang/Object;I)I
	at org.apache.nutch.crawl.Generator$Selector.getPartition(Generator.java:171)
	at org.apache.nutch.crawl.Generator$Selector.getPartition(Generator.java:83)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:464)
	at org.apache.nutch.crawl.Generator$Selector.map(Generator.java:165)
	at org.apache.nutch.crawl.Generator$Selector.map(Generator.java:83)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

Exception in thread "main" java.io.IOException: Job failed!
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062)
	at org.apache.nutch.crawl.Generator.generate(Generator.java:457)
	at org.apache.nutch.crawl.Generator.generate(Generator.java:394)
	at org.apache.nutch.crawl.Crawl.main(Crawl.java:116)


> Patch - Nutch - Hadoop 0.17.1
> -----------------------------
>
>                 Key: NUTCH-634
>                 URL: https://issues.apache.org/jira/browse/NUTCH-634
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Michael Gottesman
>            Assignee: Andrzej Bialecki 
>             Fix For: 1.0.0
>
>         Attachments: diff, hadoop-0.17.patch, hadoop-0.17.patch, hadoop-0.17.patch
>
>
> This is a patch so that Nutch can be used with Hadoop 0.17.0. The patch is located at
http://pastie.org/212001
> The patch compiles and passes all current Nutch unit tests.
> I have tested that the crawler side of Nutch (i.e. inject, generate, fetch, parse, merge
w/crawldb) definetly works, but have not tested the lucene indexing part. It might work, but
it might not. 
> *NOTE* - the two main bugs that had to be overcome were not noticed by any of the unit
tests. The bugs only came up during actual testing. The bugs were:
> 1. Changes to the Hadoop Iterator
> 2. Addition of Serialization to MapReduce Framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message