gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexis <alexis.detregl...@gmail.com>
Subject Re: Gora CassandraStore is not thread safe?
Date Tue, 30 Aug 2011 08:24:06 GMT
Hi Tom,

Thanks for testing Nutch 2.0 & Cassandra and reporting the obvious
bug. I must say there is not a very active development and testing on
Gora & Nutch, but at least there is some.

1. As regards your ConcurrentModification issue, it looks like it
happens when flushing the store. From your exception stacktrace:
(Line 192 in org.apache.gora.cassandra.store.CassandraStore)
    for (K key: this.buffer.keySet()) {

while there are other threads adding new keys to the HashMap:

(Line 266)
    this.buffer.put(key, p);

"it is not generally permissible for one thread to modify a Collection
while another thread is iterating over it."

Let me try to reproduce the bug and fix it with this in mind:
How about introducing some mutex / lock mechanism witch
java.util.concurrent.locks.Lock or easier, using a thread-safe
implementation such as java.util.concurrent.ConcurrentHashMap?

2. Regarding the OutOfMemory error, maybe decreasing the flushing
frecuency as described here?

I like to use the jvisualvm utility from the JDK that monitors the
memory usage and tells you how this evolves during the execution of
the class...


On Mon, Aug 29, 2011 at 1:50 PM, Tom Davidson <tdavidson@covario.com> wrote:
> Hi Lewis,
> I was running Nutch deployed with a dedicated Cassandra cluster. Frankly, I have given
up on using Nutch 2 at this time as it seems highly unstable and not really in active development.
Your effort to address this is encouraging. Because Nutch uses multithreading in the fetchers,
I was getting ConcurrentModification errors and OutOfMemory errors on a regular basis in the
CassandraStore. As far as I recall, the caching/flushing implementation is just not thread
safe. If the CassandraStore caching was completely removed it may work, but would probably
not be very efficient.  If I were to fix this class, I would try to rewrite it to use Hector
batched mutations instead.
> Tom
> -----Original Message-----
> From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> Sent: Monday, August 29, 2011 1:41 PM
> To: gora-dev@incubator.apache.org; dev@nutch.apache.org
> Subject: Re: Gora CassandraStore is not thread safe?
> Hi Tom,
> Apologies for cross posting, this would not usually be the case but I'm
> hoping that if any results come from the thread then both communities can
> benefit.
> I'm in the process of getting Cassandra 0.8.4 working with Nutch 2.0 and
> Gora 0.2 myself and seem to be having some nasty problems.
> Some questions for you
> 1) How are you running Nutch local or deploy?
> 2) How are you running Cassandra, local or deployed in a cluster?
> The obvious thoughts are that this is a bug and that there are
> method(s)/object(s) which are not safe.
> Have you gotten any further with this?
> Lewis
> On Wed, Aug 10, 2011 at 8:43 PM, Tom Davidson <tdavidson@covario.com> wrote:
>> Has anyone tested the CassandraStore in gora 0.2 using multiple threads?
>>  The nutch 2 fetcher architecture has many threads writing to one
>> GoraRecordWriter and I am getting concurrent modification errors like below.
>> Caused by: java.util.ConcurrentModificationException
>>               at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>>               at java.util.HashMap$KeyIterator.next(HashMap.java:828)
>>               at
>> org.apache.gora.cassandra.store.CassandraStore.flush(CassandraStore.java:192)
>>               at
>> org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65)
> --
> *Lewis*

View raw message