nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <j...@apache.org>
Subject [jira] Closed: (NUTCH-593) Nutch crawl problem
Date Wed, 06 Feb 2008 16:39:07 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrzej Bialecki  closed NUTCH-593.
-----------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.9.0)
                   1.0.0

> Nutch crawl problem
> -------------------
>
>                 Key: NUTCH-593
>                 URL: https://issues.apache.org/jira/browse/NUTCH-593
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>         Environment: java version : jdk-6u1-linux-amd64.bin, hadoop version : hadoop-0.12.0
>            Reporter: sudarat
>             Fix For: 1.0.0
>
>
> i use nutch-0.9, hadoop-0.12.2 and i use this command "bin/nutch crawl 
> urls -dir crawled -depth 3" have error : 
> - crawl started in: crawled 
> - rootUrlDir = input 
> - threads = 10 
> - depth = 3 
> - Injector: starting 
> - Injector: crawlDb: crawled/crawldb 
> - Injector: urlDir: input 
> - Injector: Converting injected urls to crawl db entries. 
> - Total input paths to process : 1 
> - Running job: job_0001 
> - map 0% reduce 0% 
> - map 100% reduce 0% 
> - map 100% reduce 100% 
> - Job complete: job_0001 
> - Counters: 6 
> - Map-Reduce Framework 
> - Map input records=3 
> - Map output records=1 
> - Map input bytes=22 
> - Map output bytes=52 
> - Reduce input records=1 
> - Reduce output records=1 
> - Injector: Merging injected urls into crawl db. 
> - Total input paths to process : 2 
> - Running job: job_0002 
> - map 0% reduce 0% 
> - map 100% reduce 0% 
> - map 100% reduce 58% 
> - map 100% reduce 100% 
> - Job complete: job_0002 
> - Counters: 6 
> - Map-Reduce Framework 
> - Map input records=3 
> - Map output records=1 
> - Map input bytes=60 
> - Map output bytes=52 
> - Reduce input records=1 
> - Reduce output records=1 
> - Injector: done 
> - Generator: Selecting best-scoring urls due for fetch. 
> - Generator: starting 
> - Generator: segment: crawled/segments/25501213164325 
> - Generator: filtering: false 
> - Generator: topN: 2147483647 
> - Total input paths to process : 2 
> - Running job: job_0003 
> - map 0% reduce 0% 
> - map 100% reduce 0% 
> - map 100% reduce 100% 
> - Job complete: job_0003 
> - Counters: 6 
> - Map-Reduce Framework 
> - Map input records=3 
> - Map output records=1 
> - Map input bytes=59 
> - Map output bytes=77 
> - Reduce input records=1 
> - Reduce output records=1 
> - Generator: 0 records selected for fetching, exiting ... 
> - Stopping at depth=0 - no more URLs to fetch. 
> - No URLs to fetch - check your seed list and URL filters. 
> - crawl finished: crawled 
> but sometime i crawl some url it has error indexes time that 
> - Indexer: done 
> - Dedup: starting 
> - Dedup: adding indexes in: crawled/indexes 
> - Total input paths to process : 2 
> - Running job: job_0025 
> - map 0% reduce 0% 
> - Task Id : task_0025_m_000001_0, Status : FAILED 
> task_0025_m_000001_0: - Error running child 
> task_0025_m_000001_0: java.lang.ArrayIndexOutOfBoundsException: -1 
> task_0025_m_000001_0: at 
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) 
> task_0025_m_000001_0: at 
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade 
> r.next(DeleteDuplicates.java:176) 
> task_0025_m_000001_0: at 
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) 
> task_0025_m_000001_0: at 
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 
> task_0025_m_000001_0: at org.apache.hadoop.mapred.MapTask.run 
> (MapTask.java:175) 
> task_0025_m_000001_0: at 
> org.apache.hadoop.mapred.TaskTracker$Child.main 
> (TaskTracker.java:1445) 
> - Task Id : task_0025_m_000000_0, Status : FAILED 
> task_0025_m_000000_0: - Error running child 
> task_0025_m_000000_0: java.lang.ArrayIndexOutOfBoundsException: -1 
> task_0025_m_000000_0: at 
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) 
> task_0025_m_000000_0: at 
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade 
> r.next(DeleteDuplicates.java:176) 
> task_0025_m_000000_0: at 
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) 
> task_0025_m_000000_0: at 
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 
> task_0025_m_000000_0: at org.apache.hadoop.mapred.MapTask.run 
> (MapTask.java:175) 
> task_0025_m_000000_0: at 
> org.apache.hadoop.mapred.TaskTracker$Child.main 
> (TaskTracker.java:1445) 
> - Task Id : task_0025_m_000000_1, Status : FAILED 
> task_0025_m_000000_1: - Error running child 
> task_0025_m_000000_1: java.lang.ArrayIndexOutOfBoundsException: -1 
> task_0025_m_000000_1: at 
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) 
> task_0025_m_000000_1: at 
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade 
> r.next(DeleteDuplicates.java:176) 
> task_0025_m_000000_1: at 
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) 
> task_0025_m_000000_1: at 
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 
> task_0025_m_000000_1: at org.apache.hadoop.mapred.MapTask.run 
> (MapTask.java:175) 
> task_0025_m_000000_1: at 
> org.apache.hadoop.mapred.TaskTracker$Child.main 
> (TaskTracker.java:1445) 
> - Task Id : task_0025_m_000001_1, Status : FAILED 
> task_0025_m_000001_1: - Error running child 
> task_0025_m_000001_1: java.lang.ArrayIndexOutOfBoundsException: -1 
> task_0025_m_000001_1: at 
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) 
> task_0025_m_000001_1: at 
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade 
> r.next(DeleteDuplicates.java:176) 
> task_0025_m_000001_1: at 
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) 
> task_0025_m_000001_1: at 
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 
> task_0025_m_000001_1: at org.apache.hadoop.mapred.MapTask.run 
> (MapTask.java:175) 
> task_0025_m_000001_1: at 
> org.apache.hadoop.mapred.TaskTracker$Child.main 
> (TaskTracker.java:1445) 
> - Task Id : task_0025_m_000001_2, Status : FAILED 
> task_0025_m_000001_2: - Error running child 
> task_0025_m_000001_2: java.lang.ArrayIndexOutOfBoundsException: -1 
> task_0025_m_000001_2: at 
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) 
> task_0025_m_000001_2: at 
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade 
> r.next(DeleteDuplicates.java:176) 
> task_0025_m_000001_2: at 
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) 
> task_0025_m_000001_2: at 
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 
> task_0025_m_000001_2: at org.apache.hadoop.mapred.MapTask.run 
> (MapTask.java:175) 
> task_0025_m_000001_2: at 
> org.apache.hadoop.mapred.TaskTracker$Child.main 
> (TaskTracker.java:1445) 
> - Task Id : task_0025_m_000000_2, Status : FAILED 
> task_0025_m_000000_2: - Error running child 
> task_0025_m_000000_2: java.lang.ArrayIndexOutOfBoundsException: -1 
> task_0025_m_000000_2: at 
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) 
> task_0025_m_000000_2: at 
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade 
> r.next(DeleteDuplicates.java:176) 
> task_0025_m_000000_2: at 
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) 
> task_0025_m_000000_2: at 
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 
> task_0025_m_000000_2: at org.apache.hadoop.mapred.MapTask.run 
> (MapTask.java:175) 
> task_0025_m_000000_2: at 
> org.apache.hadoop.mapred.TaskTracker$Child.main 
> (TaskTracker.java:1445) 
> - map 100% reduce 100% 
> - Task Id : task_0025_m_000001_3, Status : FAILED 
> task_0025_m_000001_3: - Error running child 
> task_0025_m_000001_3: java.lang.ArrayIndexOutOfBoundsException: -1 
> task_0025_m_000001_3: at 
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) 
> task_0025_m_000001_3: at 
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade 
> r.next(DeleteDuplicates.java:176) 
> task_0025_m_000001_3: at 
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) 
> task_0025_m_000001_3: at 
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 
> task_0025_m_000001_3: at org.apache.hadoop.mapred.MapTask.run 
> (MapTask.java:175) 
> task_0025_m_000001_3: at 
> org.apache.hadoop.mapred.TaskTracker$Child.main 
> (TaskTracker.java:1445) 
> - Task Id : task_0025_m_000000_3, Status : FAILED 
> task_0025_m_000000_3: - Error running child 
> task_0025_m_000000_3: java.lang.ArrayIndexOutOfBoundsException: -1 
> task_0025_m_000000_3: at 
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) 
> task_0025_m_000000_3: at 
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade 
> r.next(DeleteDuplicates.java:176) 
> task_0025_m_000000_3: at 
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) 
> task_0025_m_000000_3: at 
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 
> task_0025_m_000000_3: at org.apache.hadoop.mapred.MapTask.run 
> (MapTask.java:175) 
> task_0025_m_000000_3: at 
> org.apache.hadoop.mapred.TaskTracker$Child.main 
> (TaskTracker.java:1445) 
> Exception in thread "main" java.io.IOException: Job failed! 
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) 
> at org.apache.nutch.indexer.DeleteDuplicates.dedup 
> (DeleteDuplicates.java:439) 
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:135) 
> how i solve it? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message