phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-1015) Support joining back to data table row from local index when query condition involves leading columns in local index
Date Mon, 02 Jun 2014 19:27:02 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015772#comment-14015772
] 

James Taylor commented on PHOENIX-1015:
---------------------------------------

Thanks for the patch, [~rajesh23]. Great work - we're getting very close! See below for some
feedback. I think we're close enough to get this feature into our 4.1 release. What do you
think? How about after you incorporate these items, you submit a pull request to master (it's
easier to review that way) to our soon-to-be-up https://github.com/apache/phoenix (we should
have our TLP mirror up by tomorrow I suspect)?

1) Pass in dataColumns to getScanProjector instead of deserializing again:
+        ColumnReference[] dataColumns = deserializeDataTableColumnsToJoin(scan);
+        if (dataColumns != null) {
+            scanProjector = getScanProjector(scan);


2) These TODOs need to get done. They'd handle the case where a GROUP BY references a non
indexed KV data column or you're aggregating over a non indexed KV data column. Basically
the code you added for ScanRegionObserver needs to be shared among ScanRegionObserver, GroupedAggregateRegionObserver,
and UngroupedAggregateRegionObserver. We need to add tests for this:
a) GROUP BY over non indexed KV data column.
b) GROUP BY over non indexed PK data column 
c) Aggregate (i.e. SUM(xxx)) over non indexed KV data column (with no GROUP BY clause).
d) Aggregate over non indexed KV data column (with no GROUP BY clause).
e) ORDER BY non indexed KV data column.
f) ORDER BY non indexed PK data column.
g) Use non indexed KV data column in WHERE clause.
h) Use non indexed PK data column in WHERE clause.
i) Same as above for join queries (see HashJoinIT).
A good way to improve coverage would be to add another parameterized test to HashJoinIT and
QueryIT where a local index is created.

===================================================================
--- phoenix-core/src/main/java/org/apache/phoenix/coprocessor/GroupedAggregateRegionObserver.java
(revision 786)
+++ phoenix-core/src/main/java/org/apache/phoenix/coprocessor/GroupedAggregateRegionObserver.java
(working copy)
@@ -400,6 +400,7 @@
                     hasMore = s.nextRaw(results);
                     if (!results.isEmpty()) {
                         result.setKeyValues(results);
+                        // TODO: join back to data row here if scan attribute set
                         ImmutableBytesWritable key =
                                 TupleUtil.getConcatenatedValue(result, expressions);
                         Aggregator[] rowAggregators = groupByCache.cache(key);
@@ -477,6 +478,7 @@
                         hasMore = s.nextRaw(kvs);
                         if (!kvs.isEmpty()) {
                             result.setKeyValues(kvs);
+                            // TODO: join back to data row here if scan attribute set
                             key = TupleUtil.getConcatenatedValue(result, expressions);
                             aggBoundary = currentKey != null && currentKey.compareTo(key)
!= 0;
                             if (!aggBoundary) {

3) Also, this TODO will handle when a SELECT on a non indexed data column is done. Need to
add tests for this.

Index: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java
===================================================================
--- phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java
(revision 786)
+++ phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java
(working copy)
@@ -213,6 +213,7 @@
                     result.setKeyValues(results);
                     try {
                         if (indexMaintainers != null) {
+                            // TODO: join back to data row here if scan attribute set
                             for (IndexMaintainer maintainer : indexMaintainers) {
                                 ImmutableBytesPtr emptyKeyValueFamily = maintainer.getEmptyKeyValueFamily();
                                 Iterator<Cell> iterator = results.iterator();



> Support joining back to data table row from local index when query condition involves
leading columns in local index
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1015
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1015
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: rajeshbabu
>            Assignee: rajeshbabu
>         Attachments: PHOENIX-1015.patch
>
>
> When a query involves more columns to project than columns in index and query condition
involves leading columns in local index then first we can get matching rowkeys from local
index table and then get the required columns from data table. In local index both data region
and index region co-reside in the same RS, we can call get on data region to get the missing
columns in the index, without any n/w overhead. So it's efficient. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message