hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop Sam John <anoo...@huawei.com>
Subject RE: Scan addFamily vs FamilyFilter(EQUAL, ...)
Date Thu, 31 May 2012 06:18:53 GMT
     As per my understanding of the Scan code in your scenario where you want to go with scanning
of some CFs ( not all)  You go with Scan#addFamily.
The FamilyFilter also doing the same thing. But there is a difference in the performance.
When one specify the CFs in the scan,  the scanner will be created for only those many Stores.
For the other CFs, there wont be any scanners and so those stores are not scanned. ( The HFile
data is not fetched )
Instead when one use the FamilyFilter and not specify any specific columns (using Scan#addFamily)
all the stores will get scanned and data will get fetched from HFiles. Later these KVs corresponding
to which you needed (as per your FamilyFilter)  only will get included in the Result and others
just avoided.  So there will be performance difference I feel..   Correct me if I am wrong

>One thing I ran into when using the Scan.addFamily / Scan.addColumn is that those two
methods overwrite each other. 
In the Scan#addColumn javadoc it is clearly telling about this overwrites...   So this seems
intentionally done correct?

From: saint.ack@gmail.com [saint.ack@gmail.com] on behalf of Stack [stack@duboce.net]
Sent: Wednesday, May 30, 2012 11:13 PM
To: user@hbase.apache.org
Subject: Re: Scan addFamily vs FamilyFilter(EQUAL, ...)

On Wed, May 30, 2012 at 9:59 AM, Kevin <kevin.macksamie@gmail.com> wrote:
> I am curious and trying to learn which method is best when wanting to limit
> a scan to a particular column or column family. The Scan class carries a
> Filter instance and a TreeMap of the family map and I am unsure how they
> get carried through to the server-side functionality. In terms of
> performance is there any difference between doing Scan.addFamily(x) and
> Scan.setFilter(new FamilyFilter(CompareFilter.CompareOp.EQUAL, x)?

There is probably not noticeable difference in performance but
Scan#addFamily is the more natural way of expressing column family

View raw message