gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexis <alexis.detregl...@gmail.com>
Subject Re: Gora CassandraStore is not thread safe?
Date Sat, 01 Oct 2011 11:07:07 GMT
Last revision 1177960 should now fix the thread-safe issue:

http://svn.apache.org/viewvc/incubator/gora/trunk/gora-cassandra/src/main/java/org/apache/gora/cassandra/store/CassandraStore.java?r1=1177960&r2=1177959&pathrev=1177960

Please comment on https://issues.apache.org/jira/browse/GORA-22 if
there is anything else.

Alexis

On Sun, Sep 4, 2011 at 10:43 AM, Alexis <alexis.detreglode@gmail.com> wrote:
> Hi,
>
> I submitted the patch for peer review by just attaching it to the
> issue: https://issues.apache.org/jira/browse/GORA-22
>
> See this article about concurreny and hashmap to read about the topic:
> http://www.ibm.com/developerworks/java/library/j-jtp07233/index.html
>
> I ended up calling toArray over the key set to get around the
> ConcurrentModificationException thrown by defaut with
> java.util.HashMap when iterating over the keys.
>
> Not that many times I encountered Cassandra crashes and Hector
> exceptions (usually because of GC triggered by Cassandra daemon?) with
> my poor 5-year-old laptop while running Nutch parse command, which is
> very CPU and IO intensive. In mapred-site.xml, see attached config, it
> worked out when you make the read batch reasonable (400 rows at a
> time) and try to separate it from the write batch (for example 843
> written rows per batch) so that they don't happen simultaneously.
>
>
> Alexis
>
> On Tue, Aug 30, 2011 at 1:24 AM, Alexis <alexis.detreglode@gmail.com> wrote:
>> Hi Tom,
>>
>> Thanks for testing Nutch 2.0 & Cassandra and reporting the obvious
>> bug. I must say there is not a very active development and testing on
>> Gora & Nutch, but at least there is some.
>>
>>
>> 1. As regards your ConcurrentModification issue, it looks like it
>> happens when flushing the store. From your exception stacktrace:
>> (Line 192 in org.apache.gora.cassandra.store.CassandraStore)
>>    for (K key: this.buffer.keySet()) {
>>
>> while there are other threads adding new keys to the HashMap:
>>
>> (Line 266)
>>    this.buffer.put(key, p);
>>
>> "it is not generally permissible for one thread to modify a Collection
>> while another thread is iterating over it."
>>
>> Let me try to reproduce the bug and fix it with this in mind:
>> How about introducing some mutex / lock mechanism witch
>> java.util.concurrent.locks.Lock or easier, using a thread-safe
>> implementation such as java.util.concurrent.ConcurrentHashMap?
>>
>>
>> 2. Regarding the OutOfMemory error, maybe decreasing the flushing
>> frecuency as described here?
>> http://techvineyard.blogspot.com/2011/02/gora-orm-framework-for-hadoop-jobs.html#I_O_Frequency
>>
>> I like to use the jvisualvm utility from the JDK that monitors the
>> memory usage and tells you how this evolves during the execution of
>> the class...
>>
>> Alexis
>>
>> On Mon, Aug 29, 2011 at 1:50 PM, Tom Davidson <tdavidson@covario.com> wrote:
>>> Hi Lewis,
>>>
>>> I was running Nutch deployed with a dedicated Cassandra cluster. Frankly, I have
given up on using Nutch 2 at this time as it seems highly unstable and not really in active
development. Your effort to address this is encouraging. Because Nutch uses multithreading
in the fetchers, I was getting ConcurrentModification errors and OutOfMemory errors on a regular
basis in the CassandraStore. As far as I recall, the caching/flushing implementation is just
not thread safe. If the CassandraStore caching was completely removed it may work, but would
probably not be very efficient.  If I were to fix this class, I would try to rewrite it to
use Hector batched mutations instead.
>>>
>>> Tom
>>>
>>> -----Original Message-----
>>> From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com]
>>> Sent: Monday, August 29, 2011 1:41 PM
>>> To: gora-dev@incubator.apache.org; dev@nutch.apache.org
>>> Subject: Re: Gora CassandraStore is not thread safe?
>>>
>>> Hi Tom,
>>>
>>> Apologies for cross posting, this would not usually be the case but I'm
>>> hoping that if any results come from the thread then both communities can
>>> benefit.
>>>
>>> I'm in the process of getting Cassandra 0.8.4 working with Nutch 2.0 and
>>> Gora 0.2 myself and seem to be having some nasty problems.
>>>
>>> Some questions for you
>>>
>>> 1) How are you running Nutch local or deploy?
>>> 2) How are you running Cassandra, local or deployed in a cluster?
>>>
>>> The obvious thoughts are that this is a bug and that there are
>>> method(s)/object(s) which are not safe.
>>>
>>> Have you gotten any further with this?
>>>
>>> Lewis
>>>
>>>
>>> On Wed, Aug 10, 2011 at 8:43 PM, Tom Davidson <tdavidson@covario.com> wrote:
>>>
>>>> Has anyone tested the CassandraStore in gora 0.2 using multiple threads?
>>>>  The nutch 2 fetcher architecture has many threads writing to one
>>>> GoraRecordWriter and I am getting concurrent modification errors like below.
>>>>
>>>> Caused by: java.util.ConcurrentModificationException
>>>>               at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>>>>               at java.util.HashMap$KeyIterator.next(HashMap.java:828)
>>>>               at
>>>> org.apache.gora.cassandra.store.CassandraStore.flush(CassandraStore.java:192)
>>>>               at
>>>> org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *Lewis*
>>>
>>
>

Mime
View raw message