nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Saskia Vola (JIRA)" <>
Subject [jira] [Created] (NUTCH-2009) Fetcher does not work with batchID
Date Wed, 13 May 2015 17:41:02 GMT
Saskia Vola created NUTCH-2009:

             Summary: Fetcher does not work with batchID
                 Key: NUTCH-2009
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 2.3
         Environment: On Ubuntu 14.04 nutch 2.3 configured to use MongoDB for storage with
gora-mongodb 0.5. 

The same issue did NOT occur with gora-hbase (HBase 0.90.14). 
            Reporter: Saskia Vola
            Priority: Minor
             Fix For: 2.3.1

The fetcher does only work with the option -all.
It does not work when providing a batch-ID. 

$home/apache-nutch-2.3/runtime/local$ bin/nutch fetch 1431538082-1014788459
FetcherJob: starting at 2015-05-13 19:28:30
FetcherJob: batchId: 1431538082-1014788459
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
java.lang.IllegalArgumentException: can't serialize class org.apache.avro.util.Utf8
	at org.bson.BasicBSONEncoder._putObjectField(
	at org.bson.BasicBSONEncoder.putObject(
	at org.bson.BasicBSONEncoder.putObject(
	at com.mongodb.DefaultDBEncoder.writeObject(
	at com.mongodb.OutMessage.putObject(
	at com.mongodb.OutMessage.writeQuery(
	at com.mongodb.OutMessage.query(
	at com.mongodb.DBCollectionImpl.find(
	at com.mongodb.DBCollectionImpl.find(
	at com.mongodb.DBCursor._check(
	at com.mongodb.DBCursor._hasNext(
	at com.mongodb.DBCursor.hasNext(
	at org.apache.gora.mongodb.query.MongoDBResult.nextInner(
	at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(
	at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(
	at org.apache.hadoop.mapred.MapTask.runNewMapper(
	at org.apache.hadoop.mapred.LocalJobRunner$Job$
	at java.util.concurrent.Executors$
	at java.util.concurrent.ThreadPoolExecutor.runWorker(
	at java.util.concurrent.ThreadPoolExecutor$

This message was sent by Atlassian JIRA

View raw message