drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aditya Kishore (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-683) Qualify HBase scan with specified columns even if row_key is required.
Date Sat, 10 May 2014 22:05:05 GMT
Aditya Kishore created DRILL-683:

             Summary: Qualify HBase scan with specified columns even if row_key is required.
                 Key: DRILL-683
                 URL: https://issues.apache.org/jira/browse/DRILL-683
             Project: Apache Drill
          Issue Type: Task
            Reporter: Aditya Kishore
            Assignee: Aditya Kishore
         Attachments: DRILL-683-Qualify-HBase-scan-with-specified-columns-.patch, DRILL-683-Qualify-HBase-scan-with-specified-columns-.patch

Currently (as of https://github.com/apache/incubator-drill/commit/612527bd22c27aa92363d2297a9c2b4a05475fd0),
if row_key is specified as one of the projected column in a query, we do not qualify the HBase
scan with the specified cf\[:column qualifier].

This is done because if we qualify the scan with the columns and for some rows if ALL of these
columns do not exist (but other columns do, which means the row and hence the row_key exists),
HBase will not return even the row key.

For example, for the sample query:
{{SELECT row_key, f\['c1'], f\['c7'] from hbase.MyTable;}}
if there exists a row with following row.

row_key     f['c2']     f['c3']     f['c6']
  row1       val1        val2        val3

if we qualify the HBase scan with {{f\['c1'], f\['c7']}}, then the row => {{row1}} will
get dropped from the scan result.

However, not qualifying the scan would have severe impact on scan performance.

Hence we propose the behavior that at if NONE of the specified columns in the query are present
in a row, the entire row will be omitted from the scan result.

This message was sent by Atlassian JIRA

View raw message