hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Gupta <dlgami...@gmail.com>
Subject Re: HBase schema question
Date Sat, 21 Jan 2012 07:32:09 GMT
I am not sure how I can do joins using HBase which is essentially what I am
trying to do. Based on what I have read it looks
like HBase is really good for scans or row key lookup. Please correct me if
I am wrong.

I can have a HBase table for users with {userid + timestamp} as the rowkey.
Using this lookup for a single user for given time
range will be fast. However I need to do lookups for millions of users for
different time range. Will that also be fast ?

Also lookups are not the only thing that I am trying to do. I need to
compute statistics like sum, min, max etc for each data
point for a user. How can I do that efficiently using Hbase ?


On Fri, Jan 20, 2012 at 2:20 PM, T Vinod Gupta <tvinod@readypulse.com>wrote:

> from the little i have used hbase for, it is really good for the below use
> case you mentioned. hbase takes care of scale and you can use map reduce to
> do the kind of task you mentioned below.
> but please remember that it is super important how you design the schema.
> the schema should allow for your use case and allow for an efficient map
> reduce.
> if you decide with hbase, read the hbase book before deployment or schema
> design/implementation.
> thanks
>
> On Fri, Jan 20, 2012 at 2:10 PM, Amit Gupta <dlgamit16@gmail.com> wrote:
>
> > Hi,
> >
> >
> >
> > I am trying to figure out if Hbase is the right candidate for my use case
> > which is as follows :
> >
> >
> >
> > I have a users table containing millions users and for each user I have a
> > bunch of data points for each day in past
> >
> > 2 years. Some of these data points are number of clicks in different
> parts
> > of a web page, total # of clicks, total
> >
> > searches, # of unique searches etc. So the data is in this form :
> >
> >
> >
> > User Id
> >
> > Date
> >
> > X1 (Total Clicks)
> >
> > X2 (Total Searches)
> >
> > X3
> >
> > …..
> >
> > Xn
> >
> > 1
> >
> > D1-730
> >
> > 4
> >
> > 0.8
> >
> >
> >
> >
> >
> > 90
> >
> > 1
> >
> > D1-729
> >
> > 2
> >
> > 0.5
> >
> >
> >
> >
> >
> > 50
> >
> > …
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 1
> >
> > D1
> >
> > 30
> >
> > 0.9
> >
> >
> >
> >
> >
> > 20
> >
> > 2
> >
> > D1-730
> >
> > 23
> >
> > 1.2
> >
> >
> >
> >
> >
> > 85
> >
> > 2
> >
> > D1-729
> >
> > 56
> >
> > 2.3
> >
> >
> >
> >
> >
> > 56
> >
> > ….
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > My application has the following predominant query pattern - For a subset
> > of users (subset being quite large in order of 1 -5 mil), I want to do
> sum,
> > min, max, mean, standard deviation of data points for different date
> ranges
> > for the users. So for eg user1 may have a start and end date of {sd1,
> ed1},
> > user2 may have {sd2, ed2} and so on. I want to compute sum, min, max etc
> > for data points X1, X2, … Xn over date ranges {sd1, ed1}, {sd2, ed2} ,
> > {sd3, ed3} for each user in the subset .
> >
> >
> >
> > Currently we do this in db by creating a table for subset of the users
> with
> > their start and end day and joining against the users tables. The query
> > however is extremely slow and takes hours to execute.
> >
> >
> >
> > I am trying to figure out the following :
> >
> >   1. Can I do the above query efficiently (I want to reduce the query
> >   time. Space is not that big of a concern for me) using Hbase ?
> >
> >
> >   1. Can someone please give me alternative solutions if Hbase is not the
> >   right solution for such a use case ?
> >
> >
> >
> > Thanks,
> >
> > dlg
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message