nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MyD <myd.ro...@googlemail.com>
Subject Re: IOException in dedup
Date Tue, 02 Jun 2009 19:20:02 GMT
I had the same problem when I forgot to add the URL field in the  
index. Maybe u have the same problem.

Regards,
MyD


On Jun 3, 2009, at 1:13 AM, Nic M wrote:

>
> On Jun 2, 2009, at 12:41 PM, Ken Krugler wrote:
>
>>> Hello,
>>>
>>> I am new with Nutch and I have set up Nutch 0.9 on Easy Eclipse  
>>> for Mac OS X. When I try to start crawling I get the following  
>>> exception:
>>>
>>> Dedup: starting
>>> Dedup: adding indexes in: crawl/indexes
>>> Exception in thread "main" java.io.IOException: Job failed!
>>>         at  
>>> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>>>         at  
>>> org 
>>> .apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java: 
>>> 439)
>>>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
>>>
>>>
>>> Does anyone know how to solve this problem?
>>
>> You can get an IOException reported by Hadoop when the root cause  
>> is that you've run out of memory. Normally the hadoop.log file  
>> would have the OOM exception.
>>
>> If you're running from inside of Eclipse, see http://wiki.apache.org/nutch/RunNutchInEclipse0.9

>>  for more details.
>>
>> -- Ken
>> -- 
>> Ken Krugler
>> +1 530-210-6378
>
> Thank you for the pointers Ken. I changed the VM memory parameters  
> as shown at http://wiki.apache.org/nutch/RunNutchInEclipse0.9.  
> However, I still get the exception and in Hadoop log I have the  
> following exception
>
> 2009-06-02 13:08:18,790 INFO  indexer.DeleteDuplicates - Dedup:  
> starting
> 2009-06-02 13:08:18,817 INFO  indexer.DeleteDuplicates - Dedup:  
> adding indexes in: crawl/indexes
> 2009-06-02 13:08:19,064 WARN  mapred.LocalJobRunner - job_7izmuc
> java.lang.ArrayIndexOutOfBoundsException: -1
> 	at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java: 
> 113)
> 	at org.apache.nutch.indexer.DeleteDuplicates$InputFormat 
> $DDRecordReader.next(DeleteDuplicates.java:176)
> 	at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> 	at org.apache.hadoop.mapred.LocalJobRunner 
> $Job.run(LocalJobRunner.java:126)
>
> I am running Lucene 2.1.0. Any idea why I am getting the  
> ArrayIndexOutofBoundsEception?
>
> Nic
>
>
>


Mime
View raw message