Hi Dan,

"What advantage do you see in storing lobs in HBase?" (or HDFS directly?)

The advantages could be:
1) Storage Volume: the non-opaque part of a data model tends to be fairly finite in size, even for a site like facebook. By "non-opaque" I mean fields that we would expect to query on: "username", "age", "gender", etc. The *opaque* part tends to be virtually infinite in size. For example, adding a single photo to a user's account can take 1000 times more storage than required for the storage of the user profile -- and that's just one photo! So the first advantage is that HBase accommodates virtually infinite storage, since it sits on top of HDFS. Once the data is in HDFS, its going to be replicated and have many other access benefits, including locality of data to the HBase partition requesting the data.

2)Ubiquity: Hbase is hot. You're going to find it present in a lot of enterprises and sites. At the same time, these sites are struggling to reconcile relational vs non-relational. They generally believe "relational cannot scale". If a relational database came along and said "hey, we *collaborate* with the non-relational model" (see point (1)), then I think it would be able to leverage the ubiquity of HBase/HDFS.

3)High availability/reliability. HDFS is replicated. If I want to put a ton of important user data, like photos, into a storage system, I want to do it the way Google does it....and that's exactly what HDFS/Hbase provide.


http://nextdb.net - RESTful Relational Database

--- On Sat, 3/20/10, Dag H. Wanvik <Dag.Wanvik@Sun.COM> wrote:

From: Dag H. Wanvik <Dag.Wanvik@Sun.COM>
Subject: Re: HDFS?
To: "Derby Discussion" <derby-user@db.apache.org>
Date: Saturday, March 20, 2010, 1:54 AM

geoffrey hendrey <geoff.hendrey@gmail.com> writes:

> It could be interesting to have the roll-forward replication logs written to
> HDFS where they will be automatically replicated, then the Derby slave can
> read the roll-forward log out of HDFS. Storage of backups is another place

Data point: the current Derby replication works by sending the log
records to a slave already, but currently only to one
slave. Unfortunately, the Derby slave cannot yet even field read-only
queries (since technically it is in a recovery phase in slave mode)
which would allow an interesting scaling capabilities when combined
with HDFS.  (And not symmetrical replication, of course, i.e. dual

What advantage do you see in storing lobs in HBase?

This are interesting topics to investigate.


> where HDFS might play well. Another idea would be to provide blob storage in
> HBase. I don't see any conflict between wanting to use HBase and Derby,
> since they really do different things. And in fact, it would be really cool
> to see Derby *using* HBase for things like Blob Storage.
> It's just a random thought; but HDFS is really an amazing piece of
> engineering and it would be interesting to see it leveraged by Derby. In
> many "web scale" computing environments, it seems like Hadoop and HDFS are
> becoming quite ubiquitous.
> --
> http://nextdb.net - RESTful Relational Database
> http://www.nextdb.net/wiki/en/REST

Dag H. Wanvik, staff engineer
Sun Microsystems, Java Core and Desktop - Java DB/Derby
Haakon VII gt. 7b, N-7485 Trondheim, Norway
Tel: x43496/+47 73842196, Fax:  +47 73842101
Sun IM: dw136674, Yahoo IM: dag_h_wanvik