gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Davidson <tdavid...@covario.com>
Subject RE: Gora CassandraStore is not thread safe?
Date Mon, 29 Aug 2011 20:50:05 GMT
Hi Lewis,

I was running Nutch deployed with a dedicated Cassandra cluster. Frankly, I have given up
on using Nutch 2 at this time as it seems highly unstable and not really in active development.
Your effort to address this is encouraging. Because Nutch uses multithreading in the fetchers,
I was getting ConcurrentModification errors and OutOfMemory errors on a regular basis in the
CassandraStore. As far as I recall, the caching/flushing implementation is just not thread
safe. If the CassandraStore caching was completely removed it may work, but would probably
not be very efficient.  If I were to fix this class, I would try to rewrite it to use Hector
batched mutations instead.


-----Original Message-----
From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com] 
Sent: Monday, August 29, 2011 1:41 PM
To: gora-dev@incubator.apache.org; dev@nutch.apache.org
Subject: Re: Gora CassandraStore is not thread safe?

Hi Tom,

Apologies for cross posting, this would not usually be the case but I'm
hoping that if any results come from the thread then both communities can

I'm in the process of getting Cassandra 0.8.4 working with Nutch 2.0 and
Gora 0.2 myself and seem to be having some nasty problems.

Some questions for you

1) How are you running Nutch local or deploy?
2) How are you running Cassandra, local or deployed in a cluster?

The obvious thoughts are that this is a bug and that there are
method(s)/object(s) which are not safe.

Have you gotten any further with this?


On Wed, Aug 10, 2011 at 8:43 PM, Tom Davidson <tdavidson@covario.com> wrote:

> Has anyone tested the CassandraStore in gora 0.2 using multiple threads?
>  The nutch 2 fetcher architecture has many threads writing to one
> GoraRecordWriter and I am getting concurrent modification errors like below.
> Caused by: java.util.ConcurrentModificationException
>               at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>               at java.util.HashMap$KeyIterator.next(HashMap.java:828)
>               at
> org.apache.gora.cassandra.store.CassandraStore.flush(CassandraStore.java:192)
>               at
> org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65)


View raw message