nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koen Smets (JIRA)" <j...@apache.org>
Subject [jira] [Created] (NUTCH-1791) Null pointer exceptions with gora-cassandra-0.4
Date Tue, 03 Jun 2014 10:22:01 GMT
Koen Smets created NUTCH-1791:
---------------------------------

             Summary: Null pointer exceptions with gora-cassandra-0.4
                 Key: NUTCH-1791
                 URL: https://issues.apache.org/jira/browse/NUTCH-1791
             Project: Nutch
          Issue Type: Bug
          Components: generator, storage
    Affects Versions: 2.3
         Environment: dsc-cassandra-2.0.2, dsc-cassandra-2.0.7
            Reporter: Koen Smets
             Fix For: 2.3


Latest nutch-2.x source checkout fails to run with Cassandra 2.0.2 (and also Cassandra 2.0.7)
as storage backend both in normal Nutch operations (inject, generate, fetch) cycle as in the
junit tests {{TestGoraStorage}}

{code}
2014-06-03 11:24:23,495 INFO  connection.CassandraHostRetryService (CassandraHostRetryService.java:<init>(48))
- Downed Host Retry service started with queue size -1 and retry delay 10s
2014-06-03 11:24:23,535 INFO  service.JmxMonitor (JmxMonitor.java:registerMonitor(52)) - Registering
JMX me.prettyprint.cassandra.service_Test Cluster:ServiceType=hector,MonitorType=hector
Exception in thread "main" java.lang.NullPointerException
	at org.apache.gora.cassandra.query.CassandraResult.updatePersistent(CassandraResult.java:121)
	at org.apache.gora.cassandra.query.CassandraResult.nextInner(CassandraResult.java:57)
	at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:114)
	at org.apache.nutch.storage.TestGoraStorage.readWrite(TestGoraStorage.java:93)
	at org.apache.nutch.storage.TestGoraStorage.main(TestGoraStorage.java:230)
{code}

After injecting:

{code}
ksmets@precise64 ~/l/a/r/local> ./bin/nutch inject urls
InjectorJob: starting at 2014-06-03 11:55:11
InjectorJob: Injecting urlDir: urls
InjectorJob: Using class org.apache.gora.cassandra.store.CassandraStore as the Gora storage
class.
InjectorJob: total number of urls rejected by filters: 0
InjectorJob: total number of urls injected after normalization and filtering: 1
Injector: finished at 2014-06-03 11:55:13, elapsed: 00:00:02

ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -stats
WebTable statistics start
Statistics for WebTable:
min score:	1.0
retry 0:	1
jobs:	{db_stats-job_local1403358409_0001={jobID=job_local1403358409_0001, jobName=db_stats,
counters={File Input Format Counters ={BYTES_READ=0}, Map-Reduce Framework={MAP_OUTPUT_MATERIALIZED_BYTES=97,
MAP_INPUT_RECORDS=1, REDUCE_SHUFFLE_BYTES=0, SPILLED_RECORDS=12, MAP_OUTPUT_BYTES=53, COMMITTED_HEAP_BYTES=358612992,
CPU_MILLISECONDS=0, SPLIT_RAW_BYTES=769, COMBINE_INPUT_RECORDS=4, REDUCE_INPUT_RECORDS=6,
REDUCE_INPUT_GROUPS=6, COMBINE_OUTPUT_RECORDS=6, PHYSICAL_MEMORY_BYTES=0, REDUCE_OUTPUT_RECORDS=6,
VIRTUAL_MEMORY_BYTES=0, MAP_OUTPUT_RECORDS=4}, FileSystemCounters={FILE_BYTES_READ=974145,
FILE_BYTES_WRITTEN=1144369}, File Output Format Counters ={BYTES_WRITTEN=225}}}}
max score:	1.0
TOTAL urls:	1
status 0 (null):	1
avg score:	1.0
WebTable statistics: done

ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -url http://example.com/
key:	http://example.com/
baseUrl:	null
status:	0 (null)
fetchTime:	1401789311270
prevFetchTime:	0
fetchInterval:	2592000
retriesSinceFetch:	0
modifiedTime:	0
prevModifiedTime:	0
protocolStatus:	(null)
parseStatus:	(null)
title:	null
score:	1.0
markers:	org.apache.gora.persistency.impl.DirtyMapWrapper@eb173c
reprUrl:	null
metadata _csh_ : 	?�
{code}

After generating,

{code}
ksmets@precise64 ~/l/a/r/local> ./bin/nutch generate -topN 1
GeneratorJob: starting at 2014-06-03 11:55:38
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 1
GeneratorJob: finished at 2014-06-03 11:55:40, time elapsed: 00:00:02
GeneratorJob: generated batch id: 1401789338-222512082 containing 1 URLs

ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -stats
WebTable statistics start
Statistics for WebTable:
jobs:	{db_stats-job_local73029265_0001={jobID=job_local73029265_0001, jobName=db_stats, counters={File
Input Format Counters ={BYTES_READ=0}, Map-Reduce Framework={MAP_OUTPUT_MATERIALIZED_BYTES=6,
MAP_INPUT_RECORDS=0, REDUCE_SHUFFLE_BYTES=0, SPILLED_RECORDS=0, MAP_OUTPUT_BYTES=0, COMMITTED_HEAP_BYTES=358612992,
CPU_MILLISECONDS=0, SPLIT_RAW_BYTES=769, COMBINE_INPUT_RECORDS=0, REDUCE_INPUT_RECORDS=0,
REDUCE_INPUT_GROUPS=0, COMBINE_OUTPUT_RECORDS=0, PHYSICAL_MEMORY_BYTES=0, REDUCE_OUTPUT_RECORDS=0,
VIRTUAL_MEMORY_BYTES=0, MAP_OUTPUT_RECORDS=0}, FileSystemCounters={FILE_BYTES_READ=974054,
FILE_BYTES_WRITTEN=1144028}, File Output Format Counters ={BYTES_WRITTEN=98}}}}
TOTAL urls:	0
WebTable statistics: done

ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -url http://example.com/
WebTableReader: java.lang.NullPointerException
	at org.apache.gora.cassandra.query.CassandraResult.updatePersistent(CassandraResult.java:121)
	at org.apache.gora.cassandra.query.CassandraResult.nextInner(CassandraResult.java:57)
	at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:114)
	at org.apache.nutch.crawl.WebTableReader.read(WebTableReader.java:238)
	at org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:494)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.nutch.crawl.WebTableReader.main(WebTableReader.java:430)
{code}





--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message