gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ferdy Galema <ferdy.gal...@kalooga.com>
Subject Re: Libthrift library in gora-cassandra
Date Thu, 26 Jul 2012 07:48:50 GMT
Hi,

You are correct about HBase. HTable indeed uses the user thread to maintain
a buffer for Put operations. Delete operations are not buffered,
unfortunately. That's what makes deletes tremendously slow. There is a
batchdelete, but there are still some issues that makes it slower than a
batchput.

Ferdy.

On Wed, Jul 25, 2012 at 8:23 PM, Keith Turner <keith@deenlo.com> wrote:

> On Mon, Jul 23, 2012 at 3:19 PM, Kazuomi Kashii <kazuomi@kashii.net>
> wrote:
> > Hi Lewis,
> >
> > I used Mac with Core2Quad and 8GB memory yesterday.
> > A single node Cassandra server is running, and Goraci/GORA/Cassandra used
> > that server.
> > " goraci.sh Generator 1 25000000" took about 4 hours to complete.
> >
> > I saw the message on every 1M nodes written (flushed).
> > Since gora-cassandra does not support delete() yet, "goraci.sh Delete"
> did
> > nothing.
> > "goraci.sh Verify" took a few dozens of minutes.
> >
> > In my understanding, gora-cassandra flushes its buffer only when flush()
> or
> > close() is explicitly called.
> > I have not checked the detail of gora-hbase or gora-accumulo,
> > but if they flush the buffer more intelligently, we may want
> gora-cassandra
> > to support such feature.
>
> gora-accumulo uses the Accumulo BatchWriter.  When the user creates a
> BatchWriter to write to Accumulo they specify how much memory and how
> many threads it should use.  As the user adds mutations to the batch
> writer it buffers them.  Once the buffered mutations have used half of
> the user specified, the mutations are dumped into the background to be
> written by a thread pool.  If the user specified memory completely
> fills up, then writes are held.  When a user calls flush, it does not
> return until all buffered mutations are written.
>
> I am not positive, but I think HBase does something similar.
> Howerver, I think it does not dump mutations into the background to be
> written by a thread pool in parallel.  I think HBase uses the user
> thread to write to region servers serially when its buffers fills up.
> I could be completely wrong, this is all hearsay.  I had a discussion
> with Todd Lipcon about goraci and the difference in write speed
> between HBase and Accumulo.
>
> >
> > Thanks,
> > -Kaz
> >
> >
> >
> > On 7/23/12 11:40 AM, Lewis John Mcgibbney wrote:
> >>
> >> Hi Kaz,
> >>
> >> On Mon, Jul 23, 2012 at 5:47 PM, Kazuomi Kashii <kazuomi@kashii.net>
> >> wrote:
> >>>
> >>> I tried Goraci last night, and I had had some dependency problems.
> >>
> >> How did you get on with gora-cassandra and the goraci suite? I've
> >> shared some of my early experiences with Keith [0]. Unfortunately the
> >> hardware I'm running the test on in pretty primitive to say the last
> >> (small notebook) therefore I fear this is limiting the execution of
> >> the tests and Hadoop jobs are timing out and being killed. Also I have
> >> a few questions which I would like to reach out on.
> >>
> >> 1) When we use this test suite is the cassandra system swapping? How
> >> can I even find this out? Having spoken to Keith he clarified to me
> >> that the test writes in multiples of 1M nodes so if this is done in
> >> swap there will be problems.
> >>
> >> 2) How does gora-cassandra handle buffering? Keith also mentioned that
> >> Goraci will write 1000000 nodes and then call flush.  Accumulo and
> >> Hbase handle this ok.  If
> >> gora-cassandra actually buffered all 1000000 in memory until flush was
> >> called, then this could be bad with my small amount of memory.
> >>
> >> I'm keen to get some documentation on the execution of gora-cassandra
> >> with this test suite to understand more about the internals an of
> >> course the limitations of gora-cassandra.
> >>
> >> Any comments you have at this stage would be excellent.
> >>
> >>> For my case, I added some dependencies to Goraci's pom.xml, and it
> >>> worked,
> >>> but I am not sure that it is the same or similar issue to yours.
> >>> I used a standalone Cassandra server, not an embedded one, so I did not
> >>> include cassandra-all.
> >>
> >> I'm the same as you here. I suppose this dep can maybe be dropped from
> >> the goraci pom,xml in this instance then.
> >>
> >> Best
> >> Lewis
> >>
> >> [0] https://github.com/keith-turner/goraci/pull/7
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message