hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kranthi reddy <kranthili2...@gmail.com>
Subject Re: Porting SQL DB into HBASE
Date Wed, 14 Apr 2010 05:43:25 GMT
Hi Amandeep,

 I get your point. But the situation is a bit more complex. I have tried to
explain it a better way below.

We have around 10 databases (Each may have 20-500 tables) which maintain
information about the people of a state. Each database is used to maintain
information for a different kind of service (like VAN DB maintains
information about users who availed the facility through parked VANS,
TELECOMMUNICATION DB maintains information about users who availed the
facility through TELEPHONE).

Now since a user can access the service through various services, he ended
up having different ID's in each database. Now we plan to combine all these
databases into a single database with one master table based on a few
heuristics like username,date of birth (if username and date of birth for a
person matches in different databases, it means that he is single user and
all his information from different databases can be stored as one single
entry) etc.

The problem at hand is that since we have different databases, and since the
data is increasing daily, it would be highly impossible to maintain and
improve the system in future. Also we might end up losing track of the
databases and information about a particular user. This was the reason why
we were planning to use Hbase.

Hope I am a bit more clearer now :) .
Regards,
kranthi

On Tue, Apr 13, 2010 at 11:01 AM, Amandeep Khurana <amansk@gmail.com> wrote:

> You are mentioning 2 different reasons:
>
> Open source... Well, get MySQL..
>
> Large datasets? The table sizes that you reported in the earlier mails dont
> seem to justify a move to HBase. Keep in mind - to run HBase stably in
> production you would ideally want to have atleast 10 nodes. And you will
> have no SQL available. Make sure you are aware of the trade-offs between
> HBase v/s RDBMS before you decide... Even 100 millions rows can be handled
> by a relational database if it is tuned properly.
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Mon, Apr 12, 2010 at 10:17 PM, kranthi reddy <kranthili2020@gmail.com
> >wrote:
>
> > Hi all,
> >
> >
> > @Amandeep : The main reason for porting to Hbase is that it is an open
> > source. Currently the NGO is paying high licensing fee for Microsoft Sql
> > server. So in order to save money we planned to port to Hbase because of
> > scalability for large datasets.
> >
> > @Jonathan : The problem is that these static tables can't be combined.
> Each
> > table describes about different entities. For Eg: One static table might
> > contain information about all the counties in a country. And another
> table
> > might contain information all the doctors present in the country.
> >
> > That is the reason why I don't think it is possible to combine these
> static
> > tables as they don't have any primary/foreign keys referencing others.
> >
> > The dynamic tables are pretty huge (small when compared to what Hbase can
> > support). But these tables will be expanded and might contain upto 100
> > million in the coming future.
> >
> > Thank you,
> > kranthi
> >
> > On Tue, Apr 13, 2010 at 12:17 AM, Michael Segel
> > <michael_segel@hotmail.com>wrote:
> >
> > >
> > >
> > > Just an idea, take a look at a hierarchical design like Pick.
> > > I know its doable, but I don't know how well it will perform.
> > >
> > >
> > > > Date: Mon, 12 Apr 2010 14:25:48 +0530
> > > > Subject: Re: Porting SQL DB into HBASE
> > > > From: kranthili2020@gmail.com
> > > > To: hbase-user@hadoop.apache.org
> > > >
> > > > HI jonathan,
> > > >
> > > > Sorry for the late response. Missed your reply.
> > > >
> > > > The problem is, around 80% (400) of the tables are static tables and
> > the
> > > > remaining 20% (100) are dynamic tables that are updated on a daily
> > basis.
> > > > The problem is denormalising these 20% tables is also extremely
> > difficult
> > > > and we are planning to port them directly into hbase. And also
> > > denormalising
> > > > these tables would lead to a lot of redundant data.
> > > >
> > > > Static tables have number of entries varying in hundreds and mostly
> > less
> > > > than 1000 entries (rows). Where as the dynamic tables have more than
> > > 20,000
> > > > entries and each entry might be updated/modified at least once in a
> > week.
> > > >
> > > > Regards,
> > > > kranthi
> > > >
> > > >
> > > > On Wed, Mar 31, 2010 at 10:23 PM, Jonathan Gray <jgray@facebook.com>
> > > wrote:
> > > >
> > > > > Kranthi,
> > > > >
> > > > > HBase can handle a good number of tables, but tens or maybe a
> > hundred.
> > >  If
> > > > > you have 500 tables you should definitely be rethinking your schema
> > > design.
> > > > >  The issue is less about HBase being able to handle lots of tables,
> > and
> > > much
> > > > > more about whether scattering your data across lots of tables will
> be
> > > > > performant at read time.
> > > > >
> > > > >
> > > > > 1)  Impossible to answer that question without knowing the schemas
> of
> > > the
> > > > > existing tables.
> > > > >
> > > > > 2)  Not really any relation between fault tolerance and the number
> of
> > > > > tables except potentially for recovery time but this would be the
> > same
> > > with
> > > > > few, very large tables.
> > > > >
> > > > > 3)  No difference in write performance.  Read performance if doing
> > > simple
> > > > > key lookups would not be impacted, but most like having data spread
> > out
> > > like
> > > > > this will mean you'll need joins of some sort.
> > > > >
> > > > > Can you tell more about your data and queries?
> > > > >
> > > > > JG
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: kranthi reddy [mailto:kranthili2020@gmail.com]
> > > > > > Sent: Wednesday, March 31, 2010 3:05 AM
> > > > > > To: hbase-user@hadoop.apache.org
> > > > > > Subject: Porting SQL DB into HBASE
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > >         I have run into some trouble while trying to port SQL
DB
> to
> > > > > > Hbase.
> > > > > > The problem is my SQL DB has around 500 tables (approx) and
it is
> > > very
> > > > > > badly
> > > > > > designed. Around 45-50 tables could be denormalised into a single
> > > table
> > > > > > and
> > > > > > the remaining tables are static tables. My doubts are
> > > > > >
> > > > > > 1) Is it possible to port this DB (Tables) to Hbase? If possible
> > how?
> > > > > > 2) How many tables can Hbase support with tolerance towards
> > failure?
> > > > > > 3) When so many tables are inserted, how is the performance
going
> > to
> > > be
> > > > > > effected? Will it remain same or degrade?
> > > > > >
> > > > > > One possible solution I think is using column family for each
> > table.
> > > > > > But as
> > > > > > per my knowledge and previous experiments, I found Hbase isn't
> > stable
> > > > > > when
> > > > > > column families are more than 5.
> > > > > >
> > > > > > Since every day large quantities of data is ported into the
> > DataBase,
> > > > > > stability and fail proof system is highest priority.
> > > > > >
> > > > > > Hoping for a positive response.
> > > > > >
> > > > > > Thank you,
> > > > > > kranthi
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Kranthi Reddy. B
> > > > Room No : 98
> > > > Old Boys Hostel
> > > > IIIT-HYD
> > > >
> > > > -----------
> > > >
> > > > I don't know the key to success, but the key to failure is trying to
> > > impress
> > > > others.
> > >
> > > _________________________________________________________________
> > > The New Busy think 9 to 5 is a cute idea. Combine multiple calendars
> with
> > > Hotmail.
> > >
> > >
> >
> http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
> > >
> >
> >
> >
> > --
> > Kranthi Reddy. B
> > Room No : 98
> > Old Boys Hostel
> > IIIT-HYD
> >
> > -----------
> >
> > I don't know the key to success, but the key to failure is trying to
> > impress
> > others.
> >
>



-- 
Kranthi Reddy. B
Room No : 98
Old Boys Hostel
IIIT-HYD

-----------

I don't know the key to success, but the key to failure is trying to impress
others.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message