nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koen Smets (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1791) Null pointer exceptions with gora-cassandra-0.4
Date Thu, 12 Jun 2014 11:17:02 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029027#comment-14029027
] 

Koen Smets commented on NUTCH-1791:
-----------------------------------

Stacktrace of the unit test is complete. Null pointer excecption occurs in org.apache.gora.cassandra.query.CassandraResult.java
when trying to access cc (=null) when setting the unionField:

{{ code }}
					if (fieldType.equals(Type.UNION)) {
						//getting UNION stored type
						CassandraColumn cc = getUnionTypeColumn(fieldName + CassandraStore.UNION_COL_SUFIX,
								cassandraRow.toArray());
						//creating temporary UNION Field
						Field unionField = new Field(fieldName + CassandraStore.UNION_COL_SUFIX,
								Schema.create(Type.INT), null, null);
						// get value of UNION stored type
						cc.setField(unionField);
						Object val = cc.getValue();
						cassandraColumn.setUnionType(Integer.parseInt(val.toString()));
					}
{{ code }}

> Null pointer exceptions with gora-cassandra-0.4
> -----------------------------------------------
>
>                 Key: NUTCH-1791
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1791
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator, storage
>    Affects Versions: 2.3
>         Environment: dsc-cassandra-2.0.2, dsc-cassandra-2.0.7
>            Reporter: Koen Smets
>             Fix For: 2.3
>
>
> Latest nutch-2.x source checkout fails to run with Cassandra 2.0.2 (and also Cassandra
2.0.7) as storage backend both in normal Nutch operations (inject, generate, fetch) cycle
as in the junit tests {{TestGoraStorage}}
> {code}
> 2014-06-03 11:24:23,495 INFO  connection.CassandraHostRetryService (CassandraHostRetryService.java:<init>(48))
- Downed Host Retry service started with queue size -1 and retry delay 10s
> 2014-06-03 11:24:23,535 INFO  service.JmxMonitor (JmxMonitor.java:registerMonitor(52))
- Registering JMX me.prettyprint.cassandra.service_Test Cluster:ServiceType=hector,MonitorType=hector
> Exception in thread "main" java.lang.NullPointerException
> 	at org.apache.gora.cassandra.query.CassandraResult.updatePersistent(CassandraResult.java:121)
> 	at org.apache.gora.cassandra.query.CassandraResult.nextInner(CassandraResult.java:57)
> 	at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:114)
> 	at org.apache.nutch.storage.TestGoraStorage.readWrite(TestGoraStorage.java:93)
> 	at org.apache.nutch.storage.TestGoraStorage.main(TestGoraStorage.java:230)
> {code}
> After injecting:
> {code}
> ksmets@precise64 ~/l/a/r/local> ./bin/nutch inject urls
> InjectorJob: starting at 2014-06-03 11:55:11
> InjectorJob: Injecting urlDir: urls
> InjectorJob: Using class org.apache.gora.cassandra.store.CassandraStore as the Gora storage
class.
> InjectorJob: total number of urls rejected by filters: 0
> InjectorJob: total number of urls injected after normalization and filtering: 1
> Injector: finished at 2014-06-03 11:55:13, elapsed: 00:00:02
> ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -stats
> WebTable statistics start
> Statistics for WebTable:
> min score:	1.0
> retry 0:	1
> jobs:	{db_stats-job_local1403358409_0001={jobID=job_local1403358409_0001, jobName=db_stats,
counters={File Input Format Counters ={BYTES_READ=0}, Map-Reduce Framework={MAP_OUTPUT_MATERIALIZED_BYTES=97,
MAP_INPUT_RECORDS=1, REDUCE_SHUFFLE_BYTES=0, SPILLED_RECORDS=12, MAP_OUTPUT_BYTES=53, COMMITTED_HEAP_BYTES=358612992,
CPU_MILLISECONDS=0, SPLIT_RAW_BYTES=769, COMBINE_INPUT_RECORDS=4, REDUCE_INPUT_RECORDS=6,
REDUCE_INPUT_GROUPS=6, COMBINE_OUTPUT_RECORDS=6, PHYSICAL_MEMORY_BYTES=0, REDUCE_OUTPUT_RECORDS=6,
VIRTUAL_MEMORY_BYTES=0, MAP_OUTPUT_RECORDS=4}, FileSystemCounters={FILE_BYTES_READ=974145,
FILE_BYTES_WRITTEN=1144369}, File Output Format Counters ={BYTES_WRITTEN=225}}}}
> max score:	1.0
> TOTAL urls:	1
> status 0 (null):	1
> avg score:	1.0
> WebTable statistics: done
> ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -url http://example.com/
> key:	http://example.com/
> baseUrl:	null
> status:	0 (null)
> fetchTime:	1401789311270
> prevFetchTime:	0
> fetchInterval:	2592000
> retriesSinceFetch:	0
> modifiedTime:	0
> prevModifiedTime:	0
> protocolStatus:	(null)
> parseStatus:	(null)
> title:	null
> score:	1.0
> markers:	org.apache.gora.persistency.impl.DirtyMapWrapper@eb173c
> reprUrl:	null
> metadata _csh_ : 	?�
> {code}
> After generating,
> {code}
> ksmets@precise64 ~/l/a/r/local> ./bin/nutch generate -topN 1
> GeneratorJob: starting at 2014-06-03 11:55:38
> GeneratorJob: Selecting best-scoring urls due for fetch.
> GeneratorJob: starting
> GeneratorJob: filtering: true
> GeneratorJob: normalizing: true
> GeneratorJob: topN: 1
> GeneratorJob: finished at 2014-06-03 11:55:40, time elapsed: 00:00:02
> GeneratorJob: generated batch id: 1401789338-222512082 containing 1 URLs
> ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -stats
> WebTable statistics start
> Statistics for WebTable:
> jobs:	{db_stats-job_local73029265_0001={jobID=job_local73029265_0001, jobName=db_stats,
counters={File Input Format Counters ={BYTES_READ=0}, Map-Reduce Framework={MAP_OUTPUT_MATERIALIZED_BYTES=6,
MAP_INPUT_RECORDS=0, REDUCE_SHUFFLE_BYTES=0, SPILLED_RECORDS=0, MAP_OUTPUT_BYTES=0, COMMITTED_HEAP_BYTES=358612992,
CPU_MILLISECONDS=0, SPLIT_RAW_BYTES=769, COMBINE_INPUT_RECORDS=0, REDUCE_INPUT_RECORDS=0,
REDUCE_INPUT_GROUPS=0, COMBINE_OUTPUT_RECORDS=0, PHYSICAL_MEMORY_BYTES=0, REDUCE_OUTPUT_RECORDS=0,
VIRTUAL_MEMORY_BYTES=0, MAP_OUTPUT_RECORDS=0}, FileSystemCounters={FILE_BYTES_READ=974054,
FILE_BYTES_WRITTEN=1144028}, File Output Format Counters ={BYTES_WRITTEN=98}}}}
> TOTAL urls:	0
> WebTable statistics: done
> ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -url http://example.com/
> WebTableReader: java.lang.NullPointerException
> 	at org.apache.gora.cassandra.query.CassandraResult.updatePersistent(CassandraResult.java:121)
> 	at org.apache.gora.cassandra.query.CassandraResult.nextInner(CassandraResult.java:57)
> 	at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:114)
> 	at org.apache.nutch.crawl.WebTableReader.read(WebTableReader.java:238)
> 	at org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:494)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.nutch.crawl.WebTableReader.main(WebTableReader.java:430)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message