nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doğacan Güney <doga...@gmail.com>
Subject Re: IOException in dedup
Date Wed, 03 Jun 2009 16:07:43 GMT
On Tue, Jun 2, 2009 at 20:13, Nic M <nicdevel@gmail.com> wrote:

>
> On Jun 2, 2009, at 12:41 PM, Ken Krugler wrote:
>
> Hello,
>
>
> I am new with Nutch and I have set up Nutch 0.9 on Easy Eclipse for Mac OS
> X. When I try to start crawling I get the following exception:
>
>
> Dedup: starting
>
> Dedup: adding indexes in: crawl/indexes
>
> Exception in thread "main" java.io.IOException: Job failed!
>
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>
>         at
> org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439)
>
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
>
>
>
> Does anyone know how to solve this problem?
>
>
You may be running into this problem:

https://issues.apache.org/jira/browse/NUTCH-525

I suggest trying updating to 1.0 or applying the patch there.


>
> You can get an IOException reported by Hadoop when the root cause is that
> you've run out of memory. Normally the hadoop.log file would have the OOM
> exception.
>
> If you're running from inside of Eclipse, see
> http://wiki.apache.org/nutch/RunNutchInEclipse0.9 for more details.
>
> -- Ken
>
> --
>
> Ken Krugler
> +1 530-210-6378
>
>
> Thank you for the pointers Ken. I changed the VM memory parameters as shown
> at http://wiki.apache.org/nutch/RunNutchInEclipse0.9. However, I still get
> the exception and in Hadoop log I have the following exception
>
> 2009-06-02 13:08:18,790 INFO  indexer.DeleteDuplicates - Dedup: starting
> 2009-06-02 13:08:18,817 INFO  indexer.DeleteDuplicates - Dedup: adding
> indexes in: crawl/indexes
> 2009-06-02 13:08:19,064 WARN  mapred.LocalJobRunner - job_7izmuc
> java.lang.ArrayIndexOutOfBoundsException: -1
> at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
>  at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:176)
> at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>  at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)
>
> I am running Lucene 2.1.0. Any idea why I am getting the
> ArrayIndexOutofBoundsEception?
>
> Nic
>
>
>
>


-- 
Doğacan Güney

Mime
View raw message