hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: Multiple column families - scan performance
Date Fri, 18 Aug 2017 05:52:43 GMT
bq. a scan test on (any) single
column family in the 2nd table takes 4x the time to scan the single column
family from the 1st table
So which means your scan is targeted for a specific family only I believe?

Are you seeing lot of cache miss for the 4 col family table where as the 1
col family table does not have heavy cache miss rate?


On Fri, Aug 18, 2017 at 7:18 AM, Anoop John <anoop.hbase@gmail.com> wrote:

> So on the 2nd table, even if there are 4 CFs , while scanning you need
> only data from single CF.  And this under test CF is similar to what u
> have in the 1st table?  I mean same encoding and compression schema
> and data size?   While creating scan for 2nd table how u make?  I hope
> u do
> Scan s = new Scan();
> s.setStartRow
> s.setStopRow
> s.addFamily(cf)
> Correct?
> -Anoop-
> On Thu, Aug 17, 2017 at 4:42 PM, Partha <parthaemails@gmail.com> wrote:
> > I have 2 HBase tables - one with a single column family, and other has 4
> > column families. Both tables are keyed by same rowkey, and the column
> > families all have a single column qualifier each, with a json string as
> > value (each json payload is about 10-20K in size). All column families
> use
> > fast-diff encoding and gzip compression.
> >
> > After loading about 60MM rows to each table, a scan test on (any) single
> > column family in the 2nd table takes 4x the time to scan the single
> column
> > family from the 1st table. In both cases, the scanner is bounded by a
> start
> > and stop key to scan 1MM rows. Performance did not change much even after
> > running a major compaction on both tables.
> >
> > Though HBase doc and other tech forums recommend not using more than 1
> > column family per table, nothing I have read so far suggests scan
> > performance will linearly degrade based on number of column families. Has
> > anyone else experienced this, and is there a simple explanation for this?
> >
> > To note, the reason second table has 4 column families is even though I
> > only scan one column family at a time now, there are requirements to scan
> > multiple column families from that table given a set of rowkeys.
> >
> > Thanks for any insight into the performance question.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message