nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nic M <nicde...@gmail.com>
Subject Re: IOException in dedup
Date Thu, 04 Jun 2009 01:11:46 GMT
I used the patch and everything seems to be working fine at the  
moment. Thanks Dogacan.

Nic M

On Jun 3, 2009, at 12:07 PM, Doğacan Güney wrote:

> On Tue, Jun 2, 2009 at 20:13, Nic M <nicdevel@gmail.com> wrote:
>
> On Jun 2, 2009, at 12:41 PM, Ken Krugler wrote:
>
>>> Hello,
>>>
>>> I am new with Nutch and I have set up Nutch 0.9 on Easy Eclipse  
>>> for Mac OS X. When I try to start crawling I get the following  
>>> exception:
>>>
>>> Dedup: starting
>>> Dedup: adding indexes in: crawl/indexes
>>> Exception in thread "main" java.io.IOException: Job failed!
>>>         at  
>>> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>>>         at  
>>> org 
>>> .apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java: 
>>> 439)
>>>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
>>>
>>>
>>> Does anyone know how to solve this problem?
>>
>
>
> You may be running into this problem:
>
> https://issues.apache.org/jira/browse/NUTCH-525
>
> I suggest trying updating to 1.0 or applying the patch there.
>
>>
>> You can get an IOException reported by Hadoop when the root cause  
>> is that you've run out of memory. Normally the hadoop.log file  
>> would have the OOM exception.
>>
>> If you're running from inside of Eclipse, see http://wiki.apache.org/nutch/RunNutchInEclipse0.9

>>  for more details.
>>
>> -- Ken
>> -- 
>> Ken Krugler
>> +1 530-210-6378
>
> Thank you for the pointers Ken. I changed the VM memory parameters  
> as shown at http://wiki.apache.org/nutch/RunNutchInEclipse0.9.  
> However, I still get the exception and in Hadoop log I have the  
> following exception
>
> 2009-06-02 13:08:18,790 INFO  indexer.DeleteDuplicates - Dedup:  
> starting
> 2009-06-02 13:08:18,817 INFO  indexer.DeleteDuplicates - Dedup:  
> adding indexes in: crawl/indexes
> 2009-06-02 13:08:19,064 WARN  mapred.LocalJobRunner - job_7izmuc
> java.lang.ArrayIndexOutOfBoundsException: -1
> 	at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java: 
> 113)
> 	at org.apache.nutch.indexer.DeleteDuplicates$InputFormat 
> $DDRecordReader.next(DeleteDuplicates.java:176)
> 	at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> 	at org.apache.hadoop.mapred.LocalJobRunner 
> $Job.run(LocalJobRunner.java:126)
>
> I am running Lucene 2.1.0. Any idea why I am getting the  
> ArrayIndexOutofBoundsEception?
>
> Nic
>
>
>
>
>
>
> -- 
> Doğacan Güney


Mime
View raw message