nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sudarat (JIRA)" <j...@apache.org>
Subject [jira] Created: (NUTCH-593) Nutch crawl problem
Date Wed, 19 Dec 2007 02:49:43 GMT
Nutch crawl problem
-------------------

                 Key: NUTCH-593
                 URL: https://issues.apache.org/jira/browse/NUTCH-593
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 0.9.0
         Environment: java version : jdk-6u1-linux-amd64.bin, hadoop version : hadoop-0.12.0
            Reporter: sudarat
             Fix For: 0.9.0


i use nutch-0.9, hadoop-0.12.2 and i use this command "bin/nutch crawl 
urls -dir crawled -depth 3" have error : 

- crawl started in: crawled 
- rootUrlDir = input 
- threads = 10 
- depth = 3 
- Injector: starting 
- Injector: crawlDb: crawled/crawldb 
- Injector: urlDir: input 
- Injector: Converting injected urls to crawl db entries. 
- Total input paths to process : 1 
- Running job: job_0001 
- map 0% reduce 0% 
- map 100% reduce 0% 
- map 100% reduce 100% 
- Job complete: job_0001 
- Counters: 6 
- Map-Reduce Framework 
- Map input records=3 
- Map output records=1 
- Map input bytes=22 
- Map output bytes=52 
- Reduce input records=1 
- Reduce output records=1 
- Injector: Merging injected urls into crawl db. 
- Total input paths to process : 2 
- Running job: job_0002 
- map 0% reduce 0% 
- map 100% reduce 0% 
- map 100% reduce 58% 
- map 100% reduce 100% 
- Job complete: job_0002 
- Counters: 6 
- Map-Reduce Framework 
- Map input records=3 
- Map output records=1 
- Map input bytes=60 
- Map output bytes=52 
- Reduce input records=1 
- Reduce output records=1 
- Injector: done 
- Generator: Selecting best-scoring urls due for fetch. 
- Generator: starting 
- Generator: segment: crawled/segments/25501213164325 
- Generator: filtering: false 
- Generator: topN: 2147483647 
- Total input paths to process : 2 
- Running job: job_0003 
- map 0% reduce 0% 
- map 100% reduce 0% 
- map 100% reduce 100% 
- Job complete: job_0003 
- Counters: 6 
- Map-Reduce Framework 
- Map input records=3 
- Map output records=1 
- Map input bytes=59 
- Map output bytes=77 
- Reduce input records=1 
- Reduce output records=1 
- Generator: 0 records selected for fetching, exiting ... 
- Stopping at depth=0 - no more URLs to fetch. 
- No URLs to fetch - check your seed list and URL filters. 
- crawl finished: crawled 

but sometime i crawl some url it has error indexes time that 

- Indexer: done 
- Dedup: starting 
- Dedup: adding indexes in: crawled/indexes 
- Total input paths to process : 2 
- Running job: job_0025 
- map 0% reduce 0% 
- Task Id : task_0025_m_000001_0, Status : FAILED 
task_0025_m_000001_0: - Error running child 
task_0025_m_000001_0: java.lang.ArrayIndexOutOfBoundsException: -1 
task_0025_m_000001_0: at 
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) 
task_0025_m_000001_0: at 
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade 
r.next(DeleteDuplicates.java:176) 
task_0025_m_000001_0: at 
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) 
task_0025_m_000001_0: at 
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 
task_0025_m_000001_0: at org.apache.hadoop.mapred.MapTask.run 
(MapTask.java:175) 
task_0025_m_000001_0: at 
org.apache.hadoop.mapred.TaskTracker$Child.main 
(TaskTracker.java:1445) 
- Task Id : task_0025_m_000000_0, Status : FAILED 
task_0025_m_000000_0: - Error running child 
task_0025_m_000000_0: java.lang.ArrayIndexOutOfBoundsException: -1 
task_0025_m_000000_0: at 
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) 
task_0025_m_000000_0: at 
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade 
r.next(DeleteDuplicates.java:176) 
task_0025_m_000000_0: at 
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) 
task_0025_m_000000_0: at 
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 
task_0025_m_000000_0: at org.apache.hadoop.mapred.MapTask.run 
(MapTask.java:175) 
task_0025_m_000000_0: at 
org.apache.hadoop.mapred.TaskTracker$Child.main 
(TaskTracker.java:1445) 
- Task Id : task_0025_m_000000_1, Status : FAILED 
task_0025_m_000000_1: - Error running child 
task_0025_m_000000_1: java.lang.ArrayIndexOutOfBoundsException: -1 
task_0025_m_000000_1: at 
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) 
task_0025_m_000000_1: at 
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade 
r.next(DeleteDuplicates.java:176) 
task_0025_m_000000_1: at 
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) 
task_0025_m_000000_1: at 
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 
task_0025_m_000000_1: at org.apache.hadoop.mapred.MapTask.run 
(MapTask.java:175) 
task_0025_m_000000_1: at 
org.apache.hadoop.mapred.TaskTracker$Child.main 
(TaskTracker.java:1445) 
- Task Id : task_0025_m_000001_1, Status : FAILED 
task_0025_m_000001_1: - Error running child 
task_0025_m_000001_1: java.lang.ArrayIndexOutOfBoundsException: -1 
task_0025_m_000001_1: at 
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) 
task_0025_m_000001_1: at 
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade 
r.next(DeleteDuplicates.java:176) 
task_0025_m_000001_1: at 
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) 
task_0025_m_000001_1: at 
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 
task_0025_m_000001_1: at org.apache.hadoop.mapred.MapTask.run 
(MapTask.java:175) 
task_0025_m_000001_1: at 
org.apache.hadoop.mapred.TaskTracker$Child.main 
(TaskTracker.java:1445) 
- Task Id : task_0025_m_000001_2, Status : FAILED 
task_0025_m_000001_2: - Error running child 
task_0025_m_000001_2: java.lang.ArrayIndexOutOfBoundsException: -1 
task_0025_m_000001_2: at 
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) 
task_0025_m_000001_2: at 
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade 
r.next(DeleteDuplicates.java:176) 
task_0025_m_000001_2: at 
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) 
task_0025_m_000001_2: at 
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 
task_0025_m_000001_2: at org.apache.hadoop.mapred.MapTask.run 
(MapTask.java:175) 
task_0025_m_000001_2: at 
org.apache.hadoop.mapred.TaskTracker$Child.main 
(TaskTracker.java:1445) 
- Task Id : task_0025_m_000000_2, Status : FAILED 
task_0025_m_000000_2: - Error running child 
task_0025_m_000000_2: java.lang.ArrayIndexOutOfBoundsException: -1 
task_0025_m_000000_2: at 
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) 
task_0025_m_000000_2: at 
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade 
r.next(DeleteDuplicates.java:176) 
task_0025_m_000000_2: at 
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) 
task_0025_m_000000_2: at 
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 
task_0025_m_000000_2: at org.apache.hadoop.mapred.MapTask.run 
(MapTask.java:175) 
task_0025_m_000000_2: at 
org.apache.hadoop.mapred.TaskTracker$Child.main 
(TaskTracker.java:1445) 
- map 100% reduce 100% 
- Task Id : task_0025_m_000001_3, Status : FAILED 
task_0025_m_000001_3: - Error running child 
task_0025_m_000001_3: java.lang.ArrayIndexOutOfBoundsException: -1 
task_0025_m_000001_3: at 
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) 
task_0025_m_000001_3: at 
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade 
r.next(DeleteDuplicates.java:176) 
task_0025_m_000001_3: at 
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) 
task_0025_m_000001_3: at 
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 
task_0025_m_000001_3: at org.apache.hadoop.mapred.MapTask.run 
(MapTask.java:175) 
task_0025_m_000001_3: at 
org.apache.hadoop.mapred.TaskTracker$Child.main 
(TaskTracker.java:1445) 
- Task Id : task_0025_m_000000_3, Status : FAILED 
task_0025_m_000000_3: - Error running child 
task_0025_m_000000_3: java.lang.ArrayIndexOutOfBoundsException: -1 
task_0025_m_000000_3: at 
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) 
task_0025_m_000000_3: at 
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade 
r.next(DeleteDuplicates.java:176) 
task_0025_m_000000_3: at 
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) 
task_0025_m_000000_3: at 
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 
task_0025_m_000000_3: at org.apache.hadoop.mapred.MapTask.run 
(MapTask.java:175) 
task_0025_m_000000_3: at 
org.apache.hadoop.mapred.TaskTracker$Child.main 
(TaskTracker.java:1445) 
Exception in thread "main" java.io.IOException: Job failed! 
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) 
at org.apache.nutch.indexer.DeleteDuplicates.dedup 
(DeleteDuplicates.java:439) 
at org.apache.nutch.crawl.Crawl.main(Crawl.java:135) 

how i solve it? 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message