lens-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tao Yan <t...@linkedin.com>
Subject Narrow Table Support
Date Thu, 25 Aug 2016 22:34:14 GMT
Hi Lens Developers,

I am testing 'narrow table' with Lens. If table1 and table2 has the same
granularity and table2's columns is a subset of table1, then table2 is a
narrow table of table1.

I defined a dimension dimension1:
*dimension1.xml:*
*<x_dimension name="dimension1" xmlns="uri:lens:cube:0.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
<http://www.w3.org/2001/XMLSchema-instance>"*
*  xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">*
*  <attributes>*
*    <dim_attribute name="primary_key" _type="BIGINT"/>*
*    <dim_attribute name="attr1" _type="BIGINT"/>*
*    <dim_attribute name="attr2" _type="BIGINT"/>*
*    <dim_attribute name="attr3" _type="BIGINT"/>*
*    <dim_attribute name="attr4" _type="BIGINT"/>*
*    <dim_attribute name="attr5" _type="BIGINT"/>*
*    <dim_attribute name="attr6" _type="BIGINT"/>*
*    <dim_attribute name="attr7" _type="BIGINT"/>*
*    <dim_attribute name="attr8" _type="BIGINT"/>*
*  </attributes>*
*  <properties>*
*    <property name="dimension.dimension1.timed.dimension" value="dt"/>*
*  </properties>*
*</x_dimension>*

And defined two dimtables under this dimension:
*dimension1_division2.xml*
<x_dimension_table dimension_name="dimension1"
table_name="dimension1_division2"
weight="10.0" xmlns="uri:lens:cube:0.1"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
  <columns>
    <column name="primary_key" _type="BIGINT"/>
*    <column name="attr5" _type="BIGINT"/>*
*    <column name="attr6" _type="BIGINT"/>*
*    <column name="attr7" _type="BIGINT"/>*
*    <column name="attr8" _type="BIGINT"/>*
  </columns>
  <properties>
    <property name="dimension1.prop" value="d2"/>
  </properties>
  <storage_tables>
...
  </storage_tables>
</x_dimension_table>

*dimension1_division2_subset.xml*
<x_dimension_table dimension_name="dimension1"
table_name="dimension1_division2_subset"
weight="10.0" xmlns="uri:lens:cube:0.1"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
  <columns>
    <column name="primary_key" _type="BIGINT"/>
*    <column name="attr5" _type="BIGINT"/>*
*    <column name="attr6" _type="BIGINT"/>*
  </columns>
  <properties>
    <property name="dimension1.prop" value="d2_subset"/>
  </properties>
  <storage_tables>
...
  </storage_tables>
</x_dimension_table>

Then, I run the query again attr5:
*lens-shell>select primary_key, attr5 from dimension1*

I expect it will use the table *dimension1_division2_subset *since it is a
narrow table and hence will be more efficient. However, it picked up the
table *dimension1_division2:*

*25 Aug 2016 22:18:14 [Spring Shell] INFO  cliLogger -  Driver query:
'INSERT OVERWRITE DIRECTORY
"hdfs://ltx1-holdemnn01.grid.linkedin.com:9000/user/dev_svc/lensreports/hdfsout/f20e5f20-3f09-4d32-a1b0-ac79d97e0f1a
<http://ltx1-holdemnn01.grid.linkedin.com:9000/user/dev_svc/lensreports/hdfsout/f20e5f20-3f09-4d32-a1b0-ac79d97e0f1a>"
 SELECT ( dimension1 . primary_key ), ( dimension1 . attr5 ) FROM
holdem_dimension1_division2 dimension1 WHERE ((((dimension1.dt =
'latest')))) ' and Driver handle: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=6ed9819c-d05f-427f-9ec3-740644072ce9]*

Based on the server log, it seems the first candidate dimtable is picked:

*[estimate-11] INFO  org.apache.lens.cube.parse.CubeQueryContext -
Available candidate dims are:[dimension1_division2,
dimension1_division2_subset], picking up dimension1_division2 for querying.*

Is it the expected behavior?  If that is the case, then, do we have plan to
support this type of optimization?

Thanks,
-- 

*Tao Yan*
Software Engineer
Data Analytics Infrastructure Tools and Services



206.250.5345
tyan@linkedin.com
https://www.linkedin.com/in/taousc

Mime
View raw message