hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Cassandra vs HBase
Date Tue, 01 Sep 2009 19:23:32 GMT
They have aspects in common -- java, datastores, apache -- but the
differences are pretty acute:

+ Cassandra does eventual consistency.  HBase does strong consistency.  See
http://devblog.streamy.com/2009/08/24/cap-theorem/ for more on this.
+ Cassandra does not do BigTable cell versions.  It only keeps the latest.
In HBase you can have as many versions as you want.
+ Cassandra underpinnings are based on AMZ Dynamo (keys are
distributed/replicated in buckets spread over a consistent hashing unit
circle, etc.  Apparently there is means of ordering keys around the circle
but I don't know much about this).  HBase chassis tries to be that described
in the BT paper.
+ Because Cassandra has the above underpinnings, it purportedly can span
data centers.  HBase has no such facility currently (In 0.21, HBase will
have a replication facility)
+ Cassandra does not have have a natural sharding notion as there is in
HBase -- i.e. HBase Regions -- so hooking Cassandra to MapReduce is awkward.
+ The Cassandra fellas talk of their app being one ball of code only whereas
with HBase there is HDFS, ZooKeeper and then HBase itself (Apparently it has
less lines of code too).
+ Cassandra has an "extra" dimension in its data model called supercolumns
(Serialize a List to a cell in HBase if your application requires this extra

Less tangible differences -- or differences that can be addressed through
application and development -- would include community, maturity, number and
variety of production installs, and features (monitoring, shells, UIs, admin
tools, etc.).  On these latter dimensions, HBase would seem to do better but
do the research and make your own call.

Hope this helps,

On Tue, Sep 1, 2009 at 11:45 AM, charles du <taiping.du@gmail.com> wrote:

> Hi:
> Does anyone have experience with both Cassandra and HBase? To me, they
> target at a similar problem. I am wondering what are main differences
> between these two, like reliablity/performance/features?
> Thanks.
> --
> tp

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message