drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aditya Kishore (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-683) Qualify HBase scan with specified columns even if row_key is required.
Date Sat, 10 May 2014 22:05:41 GMT

     [ https://issues.apache.org/jira/browse/DRILL-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Aditya Kishore updated DRILL-683:

    Attachment: DRILL-683-Qualify-HBase-scan-with-specified-columns-.patch

> Qualify HBase scan with specified columns even if row_key is required.
> ----------------------------------------------------------------------
>                 Key: DRILL-683
>                 URL: https://issues.apache.org/jira/browse/DRILL-683
>             Project: Apache Drill
>          Issue Type: Task
>            Reporter: Aditya Kishore
>            Assignee: Aditya Kishore
>              Labels: documentaion, hbase
>         Attachments: DRILL-683-Qualify-HBase-scan-with-specified-columns-.patch, DRILL-683-Qualify-HBase-scan-with-specified-columns-.patch
> Currently (as of https://github.com/apache/incubator-drill/commit/612527bd22c27aa92363d2297a9c2b4a05475fd0),
if row_key is specified as one of the projected column in a query, we do not qualify the HBase
scan with the specified cf\[:column qualifier].
> This is done because if we qualify the scan with the columns and for some rows if ALL
of these columns do not exist (but other columns do, which means the row and hence the row_key
exists), HBase will not return even the row key.
> For example, for the sample query:
> {{SELECT row_key, f\['c1'], f\['c7'] from hbase.MyTable;}}
> if there exists a row with following row.
> {noformat}
> row_key     f['c2']     f['c3']     f['c6']
> ---------------------------------------------
>   row1       val1        val2        val3
> {noformat}
> if we qualify the HBase scan with {{f\['c1'], f\['c7']}}, then the row => {{row1}}
will get dropped from the scan result.
> However, not qualifying the scan would have severe impact on scan performance.
> Hence we propose the behavior that at if NONE of the specified columns in the query are
present in a row, the entire row will be omitted from the scan result.

This message was sent by Atlassian JIRA

View raw message