hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: Porting SQL DB into HBASE
Date Wed, 14 Apr 2010 12:30:01 GMT

> Date: Wed, 14 Apr 2010 12:03:56 +0600
> Subject: Re: Porting SQL DB into HBASE
> From: imyousuf@gmail.com
> To: hbase-user@hadoop.apache.org
> On Mon, Apr 12, 2010 at 2:55 PM, kranthi reddy <kranthili2020@gmail.com> wrote:
> >
> > <snip />
> > The problem is denormalising these 20% tables is also extremely difficult
> > and we are planning to port them directly into hbase. And also denormalising
> > these tables would lead to a lot of redundant data.
> >
> When denormalisation is been mentioned, it is implied having redundant
> data. The idea is as there is no join instead of doing N lookups (to
> replace N joins) keeping redundant data will allow you to do a single
> lookup and furthermore, HBase is great in scaling huge data sets.

>From reading his last post, I suspect its less of an issue of denormalization than one
of poor database design.

Paraphrasing his example, he has one table for users who access his system by phone. He has
one table for users who access the system by van. 

Without looking at his table structures, its hard to see why he can't combine the two and
then have a single field to denote access type (phone, van, etc ...) Even if there are fields
that are unique to phone and fields that are unique to van, it doesn't mean that they can't
be null.

Again, sometimes you have to look at alternatives to how you achieve your physical model of
your database.
If you have a parent/child relationship between data, you can easily use a hierarchical model
like Pick (U2,Revelation, etc) Not that I'm really a fan of Dick Pick (RIP) but this model
would fit within HBase and work well. (I should add a caveat on column width and table size,
but that's a different issue)

Going back to the problem the OP is having, he really needs to rethink his design. 

IMHO, I think one important issue that doesn't get addressed is thinking of your database
as something more than a way to persist your objects. ;-) [And that is one thing that you
debate at a bar, over beers (or your favorite beverage) :-) ]



The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message