hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yonghu <yongyong...@gmail.com>
Subject Re: multiple data versions vs. multiple rows?
Date Mon, 19 Jan 2015 20:17:34 GMT

Thanks for your suggestion. I have already considered the first issue that
one row  is not allowed to be split between 2 regions.

However, I have made a small scan-test with MapReduce. I first created a
table t1 with 1 million rows and allowed each column to store 10 data
versions. Then, I translated t1 into t2 in which multiple data versions in
t1 were transformed into multiple rows in t2. I wrote two MapReduce
programs to scan t1 and t2 individually. What I got is the table scanning
time of t1 is shorter than t2. So, I think for performance reason, multiple
data versions may be a better option than multiple rows.

But just as you said, which approach to use depends on how many historical
events you want to keep.



On Mon, Jan 19, 2015 at 8:37 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Yong,
> A row will not split between 2 regions. If you plan having thousands of
> versions, based on the size of your data, you might end up having a row
> bigger than your preferred region size.
> If you plan just keep few versions of the history to have a look at it, I
> will say go with it. If you plan to have one million version because you
> want to keep all the events history, go with the row approach.
> You can also consider going with the Column Qualifier approach. This has
> the same constraint as the versions regarding the split in 2 regions, but
> it might me easier to manage and still give you the consistency of being
> within a row.
> JM
> 2015-01-19 14:28 GMT-05:00 yonghu <yongyong313@gmail.com>:
> > Dear all,
> >
> > I want to record the user history data. I know there exists two options,
> > one is to store user events in a single row with multiple data versions
> and
> > the other one is to use multiple rows. I wonder which one is better for
> > performance?
> >
> > Thanks!
> >
> > Yong
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message