drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aditya Kishore" <adityakish...@gmail.com>
Subject Re: Review Request 21165: DRILL-626: Project push down into HBase scan
Date Thu, 08 May 2014 23:57:44 GMT


> On May 7, 2014, 6:49 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjIntoScan.java,
line 85
> > <https://reviews.apache.org/r/21165/diff/1/?file=576119#file576119line85>
> >
> >     Since the new Scan has the same rowtype as the original scan node, and we get
the column count from RowType, when we compute the cost for this scan operator, I wonder if
may lead to the same cost for the new scan and old scan operator. As a result, the new scan
may not be chosen by optiq optimizer. 
> >     
> >     Probably, we need modify the code of getting column count in DrillScanRel.computeSelfCost().

Agree, will address that in updated patch.


> On May 7, 2014, 6:49 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java,
line 165
> > <https://reviews.apache.org/r/21165/diff/1/?file=576120#file576120line165>
> >
> >     For RexLiteral, I think we had better use literal.getType().getSqlTypeName().
> >     
> >     literal.getTypeName() returns a broad type of the literal. For example, all
exact numbers, including integers have typeName "Decimal". If we use getTypeName(), then 
f[12.3] will also be pushed down into scan? 
> >

f[12.3] will not pass SQL validation.

>>>>>>>>>>>>>>>>>>>>>
SEVERE: org.eigenbase.sql.validate.SqlValidatorException: Cannot apply 'ITEM' to arguments
of type 'ITEM(<ANY>, <DECIMAL(3, 1)>)'. Supported form(s): <ARRAY>[<INTEGER>]
<<<<<<<<<<<<<<<<<<<<<

But in any case, I'll make the change just in case.


> On May 7, 2014, 6:49 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java,
line 167
> > <https://reviews.apache.org/r/21165/diff/1/?file=576120#file576120line167>
> >
> >     If we have a string literal in the query, then, this code will return a NamedSegment,
and put the NamedSegment into the pushed column lists. For instance, 
> >     
> >     select 'abc', 'cde', regular_column from cp.`data.parquet`
> >     
> >     if table happens to have column named 'abc', 'cde', then, the pushed columns
means unnecessary scan cost.
> >     
> >     
> >

Thanks for uncovering this. I am fixing this by removing the overridden visitLiteral() function
as that is not required.


> On May 7, 2014, 6:49 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java,
line 144
> > <https://reviews.apache.org/r/21165/diff/1/?file=576120#file576120line144>
> >
> >     Here, we override visitCall() in this visitor class.  Do you think we override
 visitOver()?  RexOver extendes RexCall, and also contains operands, though currently we do
not support window function yet. 
> >      
> >

visitOver() in the super class takes care of visiting the operands. Tested (only the parsing)
with the following query

SELECT
  f, SUM(f2['c2']) OVER (ROWS 3 PRECEDING), AVG(f3[2]) OVER (ROWS 10 PRECEDING)
FROM
  hbase.MyTable;

Projected column => [SchemaPath [`f`], SchemaPath [`f2`.`c2`], SchemaPath [`f3`[2]]]


- Aditya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21165/#review42453
-----------------------------------------------------------


On May 7, 2014, 9:34 a.m., Aditya Kishore wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21165/
> -----------------------------------------------------------
> 
> (Updated May 7, 2014, 9:34 a.m.)
> 
> 
> Review request for drill, Jinfeng Ni and Steven Phillips.
> 
> 
> Bugs: DRILL-626
>     https://issues.apache.org/jira/browse/DRILL-626
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> If a query against an HBase table requires only a subset of columns, we should qualify
the HBase scan with these columns.
> 
> For example
> 
>     SELECT row_key, f['c1'], f['c2'], g FROM hbase.MyTable
> 
> should qualify the HBase scan as families => [g["ALL"], f["c1", "c2"]]
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/drill/common/exceptions/DrillRuntimeException.java
9266cdd 
>   contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseGroupScan.java
bcdebc3 
>   contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java
aa5743f 
>   contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseSubScan.java
ceaf23f 
>   contrib/storage-hbase/src/test/java/org/apache/drill/hbase/BaseHBaseTest.java PRE-CREATION

>   contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java
1911078 
>   contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseProjectPushDown.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScan.java
cd78bc1 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/GroupScan.java 492dbc1

>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjIntoScan.java
0eae1da 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java d69f8cf

>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyGroupScan.java
f94cff8 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java 2972928

>   exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaGroupScan.java
5202038 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
4d4ec9b 
> 
> Diff: https://reviews.apache.org/r/21165/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Aditya Kishore
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message