nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: IOException in dedup
Date Tue, 02 Jun 2009 18:19:21 GMT
>On Jun 2, 2009, at 12:41 PM, Ken Krugler wrote:
>
>>>Hello,
>>>
>>>
>>>I am new with Nutch and I have set up Nutch 0.9 on Easy Eclipse 
>>>for Mac OS X. When I try to start crawling I get the following 
>>>exception:
>>>
>>>
>>>Dedup: starting
>>>
>>>Dedup: adding indexes in: crawl/indexes
>>>
>>>Exception in thread "main" java.io.IOException: Job failed!
>>>
>>>	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>>>
>>>	at 
>>>org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439)
>>>
>>>	at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
>>>
>>>
>>>
>>>Does anyone know how to solve this problem? 
>>>
>>
>>You can get an IOException reported by Hadoop when the root cause 
>>is that you've run out of memory. Normally the hadoop.log file 
>>would have the OOM exception.
>>
>>If you're running from inside of Eclipse, 
>>see <http://wiki.apache.org/nutch/RunNutchInEclipse0.9>http://wiki.apache.org/nutch/RunNutchInEclipse0.9
for 
>>more details.
>>
>>-- Ken
>>--
>>Ken Krugler
>>+1 530-210-6378
>>
>
>Thank you for the pointers Ken. I changed the VM memory parameters 
>as shown 
>at <http://wiki.apache.org/nutch/RunNutchInEclipse0.9>http://wiki.apache.org/nutch/RunNutchInEclipse0.9.

>However, I still get the exception and in Hadoop log I have the 
>following exception
>
>2009-06-02 13:08:18,790 INFO  indexer.DeleteDuplicates - Dedup: starting
>2009-06-02 13:08:18,817 INFO  indexer.DeleteDuplicates - Dedup: 
>adding indexes in: crawl/indexes
>2009-06-02 13:08:19,064 WARN  mapred.LocalJobRunner - job_7izmuc
>java.lang.ArrayIndexOutOfBoundsException: -1
>	at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
>	at 
>org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:176)
>	at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>	at 
>org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)
>
>I am running Lucene 2.1.0. Any idea why I am getting the 
>ArrayIndexOutofBoundsEception?

Most likely is that the index has been corrupted. If you can, try 
opening it using Luke.

-- Ken
-- 
Ken Krugler
+1 530-210-6378
Mime
View raw message