phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PHOENIX-10) Push projection of a single ARRAY element to the server
Date Wed, 29 Jan 2014 09:04:09 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885153#comment-13885153
] 

James Taylor edited comment on PHOENIX-10 at 1/29/14 9:02 AM:
--------------------------------------------------------------

Wanted to provide a bit more detail on how this could be implemented. It's fairly similar
to the way we manage multiple aggregate functions in a single query and return all the values
in a single KV:

- At compile time, while parsing the SELECT clause, keep a LinkedHashSet<KeyValueColumnExpression>
of all the unique array index expressions in a query (i.e. that is, where a ArrayIndexExpression
contains a KeyValueColumnExpression).
- Filter out of this list any occurrences of a KeyValueColumnExpression in the SELECT that
returns the entire array (as in that case, you do need to return the entire array)
- Replace the remaining ArrayIndexExpressions in the LinkedHashSet with a different, new expression
called something like ArrayIndexPositionalExpression which is constructed with the position
of the KeyValueColumnExpression. It will use this as an index to look up the array element
value in a known KeyValue (see below).
- Push this information through a Scan attribute for the ScanRegionObserver
- Post filter the List<KeyValue> that will be returned by removing any KeyValueColumnExpression
(based on its cf:cq values). Add a new single, constant named KeyValue that appends (in order
of the LinkedHashSet) the value of the element being looked up. We support this through our
KeyValueSchema (which is used for both aggregation and hash joins)
- The expression you created and replaced the original ArrayIndexExpression will lookup the
value by positional index using the KeyValueSchema.iterator method

That gives a somewhat more detailed way to implement this through our coprocessor as opposed
to a filter.


was (Author: jamestaylor):
Wanted to provide a bit more detail on how this could be implemented. It's fairly similar
to the way we manage multiple aggregate functions in a single query and return all the values
in a single KV:

- At compile time, while parsing the SELECT clause, keep a LinkedHashSet<KeyValueColumnExpression>
of all the unique array index expressions in a query (i.e. that is, where a ArrayIndexExpression
contains a KeyValueColumnExpression).
- Filter out of this list any occurrences of a KeyValueColumnExpression in the SELECT that
returns the entire array (as in that case, you do need to return the entire array)
- Replace the remaining ArrayIndexExpressions in the LinkedHashSet with a different, new expression
called something like ArrayIndexPositionalExpression which is constructed with the position
of the KeyValueColumnExpression. It will use this as an index to look up the array element
value in a known KeyValue (see below).
- Push this information through a Scan attribute for the ScanRegionObserver
- Post filter the List<KeyValue> that will be returned by removing any KeyValueColumnExpression
(based on its cf:cq values). Add a new single, constant named KeyValue that appends (in order
of the LinkedHashSet) the value of the element being looked up. We support this through our
KeyValueSchema (which is used for both aggregation and hash joins)
- The expression you created and replaced the original ArrayIndexExpression will lookup the
value by positional index using the KeyValueSchema.iterator method

> Push projection of a single ARRAY element to the server
> -------------------------------------------------------
>
>                 Key: PHOENIX-10
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-10
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>
> If only a single array element is selected, we'll still return the entire array back
to the client. Instead, we should push this to the server and only return the single array
element. The same goes for the reference to an ARRAY in the WHERE clause. There's a general
HBase fix for this (i.e. the ability to define a separate set of key values that will be returned
versus key values available to filters) that has a patch here, but is deemed not possible
to pull into the 0.94 branch by @lhofhansl.
> My thought is that we can add a Filter at the end our our filter chain that filters out
any KeyValues that aren't in the SELECT expressions (i.e. filter out if a column is referenced
in the WHERE clause, but not in the SELECT expressions). This same Filter could handle returning
only the elements of the array that are referenced in the SELECT expression rather than the
entire array.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message