gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roland (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GORA-211) thread safety: java.lang.NullPointerException
Date Thu, 28 Feb 2013 19:37:12 GMT

    [ https://issues.apache.org/jira/browse/GORA-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589847#comment-13589847
] 

Roland commented on GORA-211:
-----------------------------

Since I found the Hector thing, I'm not longer sure if my first shot was useful at all. But
at least it poses a risk to put objects to a queue without cloning them, I think.

I'm running nutch inject -> generate -> fetch cycle. The exceptions happen during fetching.
I inject about 100-400k URLs per cycle.
The fetch is running with 30 threads, 2 threads / queue. There are about 50 to 100 queues
generated by nutch.
The 16 core AMD can run a lot of things simultaneously and I think, this is why I hit this
problem.

Hope this helps :)
                
> thread safety: java.lang.NullPointerException
> ---------------------------------------------
>
>                 Key: GORA-211
>                 URL: https://issues.apache.org/jira/browse/GORA-211
>             Project: Apache Gora
>          Issue Type: Bug
>          Components: storage-cassandra
>    Affects Versions: 0.2
>         Environment: nutch 2.1 / cassandra 1.2.1 / gora-cassandra 0.2 / gora-core 0.2.1

> running fetch with parse=true 
> fetcher.threads.per.queue=2
> nutch on a 16 core AMD  Opteron 2GHz
> Cassandra on 8 core Intel Xeon 3.3 GHz
>            Reporter: Roland
>            Priority: Critical
>
> This is the result of debugging one of my issues described in NUTCH-1534. 
> example trace:
> java.lang.NullPointerException
>         at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
>         at me.prettyprint.cassandra.model.MutatorImpl.insert(MutatorImpl.java:71)
>         at org.apache.gora.cassandra.store.CassandraClient.addColumn(CassandraClient.java:139)
>         at org.apache.gora.cassandra.store.CassandraStore.addOrUpdateField(CassandraStore.java:307)
>         at org.apache.gora.cassandra.store.CassandraStore.flush(CassandraStore.java:212)
>         at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65)
>         at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587)
>         at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>         at org.apache.nutch.fetcher.FetcherReducer$FetcherThread.output(FetcherReducer.java:664)
>         at org.apache.nutch.fetcher.FetcherReducer$FetcherThread.run(FetcherReducer.java:534)
> I'm suspecting CassandraStore.put() not taking enough precautions to copy all objects
safely to it's buffer.
> {code}
>         switch(type) {
>           case RECORD:
>             Persistent persistent = (Persistent) fieldValue;
>             Persistent newRecord = persistent.newInstance(new StateManagerImpl());
>             for (Field member: fieldSchema.getFields()) {
>               newRecord.put(member.pos(), persistent.get(member.pos()));
>             }
>             fieldValue = newRecord;
>             break;
>           case MAP:
>             StatefulHashMap<?, ?> map = (StatefulHashMap<?, ?>) fieldValue;
>             StatefulHashMap<?, ?> newMap = new StatefulHashMap(map);
>             fieldValue = newMap;
>             break;
>         }
> {code}
> case RECORD - do we not need to duplicate the object returned by "persistent.get(member.pos())":
>   newRecord.put(member.pos(), persistent.get(member.pos()))
> case MAP - do we not need to duplicate all value-objects of the map?
> I had not time to write a patch or test this, so, please comment :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message