phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samarth Jain (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-3560) Aggregate query performance is worse with encoded columns for schema with large number of columns
Date Wed, 11 Jan 2017 02:29:58 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816901#comment-15816901
] 

Samarth Jain commented on PHOENIX-3560:
---------------------------------------

We use a SINGLE_KEYVALUE_COLUMN_QUALIFIER "1" which is sorted after our empty key value column
0 ( I should probably change it use the Integer representation of 1). 

[~mujtabachohan] and I tested this out offline. And it turned that that increasing the block
cache size helped speed up the performance of the query. It runs 2x faster than against non-encoded
immutable table. 

[~lhofhansl] pointed out that because HBase automatically increases the block size to fit
in a key value with the default block size being 64K. He mentioned that what likely is happening
in this case is that the "empty" key value and the packed key value both end up on the block
whose size is much larger than 64K. As a result, we are not able to really take advantage
of the first key only filter since we always have to read this entire large block before we
could skip to the next row.

> Aggregate query performance is worse with encoded columns for schema with large number
of columns
> -------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3560
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3560
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Mujtaba Chohan
>            Assignee: Thomas D'Silva
>             Fix For: 4.10.0
>
>         Attachments: DataGenerator.java, PHOENIX-3565.patch
>
>
> Schema with 5K columns
> {noformat}
> create table (k1 integer, k2 integer, c1 varchar ... c5000 varchar CONSTRAINT PK PRIMARY
KEY (K1, K2)) 
> VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=true
> {noformat}
> In this test, there are no null columns and each column contains 200 chars i.e. 1MB of
data per row.
> Count * aggregation is about 5X slower with encoded columns when compared to table non-encoded
columns using the same schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message