nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "taknev ivrok (JIRA)" <>
Subject [jira] Created: (NUTCH-630) Error caused by index-more plugin in the latest svn revision - 652259
Date Wed, 30 Apr 2008 15:21:56 GMT
Error caused by index-more plugin  in the latest svn revision - 652259 

                 Key: NUTCH-630
             Project: Nutch
          Issue Type: Bug
            Reporter: taknev ivrok

This problem is reported in the user mailng list:
Upon running bin/nutch  crawl urls -dir crawl  in the latest svn version the following error

Note: This error does not happen after I remove index-more plugin from plugin.includes in
the conf/nutch-site.xml file. 

Fetcher: done
CrawlDb update: starting
CrawlDb update: db: crawlfs/crawldb
CrawlDb update: segments: [crawlfs/segments/20080430051112]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawlfs/segments/20080430051126
Generator: filtering: true
Generator: topN: 100000
Generator: jobtracker is 'local', generating exactly one partition.
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=2 - no more URLs to fetch.
LinkDb: starting
LinkDb: linkdb: crawlfs/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment:
LinkDb: adding segment:
LinkDb: done
Indexer: starting
Indexer: linkdb: crawlfs/linkdb
Indexer: adding segment:
Indexer: adding segment:
IFD [Thread-102]: setInfoStream
IW 0 [Thread-102]: setInfoStream:
ramBufferSizeMB=16.0 maxBuffereDocs=50 maxBuffereDeleteTerms=-1
maxFieldLength=10000 index=
Exception in thread "main" Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(
        at org.apache.nutch.indexer.Indexer.index(
        at org.apache.nutch.crawl.Crawl.main( 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message