lens-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajat Khandelwal <rajatgupt...@gmail.com>
Subject Re: Narrow Table Support
Date Fri, 26 Aug 2016 10:46:49 GMT
Can you try setting the weight of the narrow table less than the weight of
the other table? I see they are both set to same value: 10.

On Fri, Aug 26, 2016 at 4:04 AM Tao Yan <tyan@linkedin.com> wrote:

> Hi Lens Developers,
>
> I am testing 'narrow table' with Lens. If table1 and table2 has the same
> granularity and table2's columns is a subset of table1, then table2 is a
> narrow table of table1.
>
> I defined a dimension dimension1:
> *dimension1.xml:*
> *<x_dimension name="dimension1" xmlns="uri:lens:cube:0.1"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
> <http://www.w3.org/2001/XMLSchema-instance>"*
> *  xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">*
> *  <attributes>*
> *    <dim_attribute name="primary_key" _type="BIGINT"/>*
> *    <dim_attribute name="attr1" _type="BIGINT"/>*
> *    <dim_attribute name="attr2" _type="BIGINT"/>*
> *    <dim_attribute name="attr3" _type="BIGINT"/>*
> *    <dim_attribute name="attr4" _type="BIGINT"/>*
> *    <dim_attribute name="attr5" _type="BIGINT"/>*
> *    <dim_attribute name="attr6" _type="BIGINT"/>*
> *    <dim_attribute name="attr7" _type="BIGINT"/>*
> *    <dim_attribute name="attr8" _type="BIGINT"/>*
> *  </attributes>*
> *  <properties>*
> *    <property name="dimension.dimension1.timed.dimension" value="dt"/>*
> *  </properties>*
> *</x_dimension>*
>
> And defined two dimtables under this dimension:
> *dimension1_division2.xml*
> <x_dimension_table dimension_name="dimension1"
> table_name="dimension1_division2" weight="10.0" xmlns="uri:lens:cube:0.1"
>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
>   <columns>
>     <column name="primary_key" _type="BIGINT"/>
> *    <column name="attr5" _type="BIGINT"/>*
> *    <column name="attr6" _type="BIGINT"/>*
> *    <column name="attr7" _type="BIGINT"/>*
> *    <column name="attr8" _type="BIGINT"/>*
>   </columns>
>   <properties>
>     <property name="dimension1.prop" value="d2"/>
>   </properties>
>   <storage_tables>
> ...
>   </storage_tables>
> </x_dimension_table>
>
> *dimension1_division2_subset.xml*
> <x_dimension_table dimension_name="dimension1"
> table_name="dimension1_division2_subset" weight="10.0"
> xmlns="uri:lens:cube:0.1"
>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
>   <columns>
>     <column name="primary_key" _type="BIGINT"/>
> *    <column name="attr5" _type="BIGINT"/>*
> *    <column name="attr6" _type="BIGINT"/>*
>   </columns>
>   <properties>
>     <property name="dimension1.prop" value="d2_subset"/>
>   </properties>
>   <storage_tables>
> ...
>   </storage_tables>
> </x_dimension_table>
>
> Then, I run the query again attr5:
> *lens-shell>select primary_key, attr5 from dimension1*
>
> I expect it will use the table *dimension1_division2_subset *since it is
> a narrow table and hence will be more efficient. However, it picked up the
> table *dimension1_division2:*
>
> *25 Aug 2016 22:18:14 [Spring Shell] INFO  cliLogger -  Driver query:
> 'INSERT OVERWRITE DIRECTORY
> "hdfs://ltx1-holdemnn01.grid.linkedin.com:9000/user/dev_svc/lensreports/hdfsout/f20e5f20-3f09-4d32-a1b0-ac79d97e0f1a
> <http://ltx1-holdemnn01.grid.linkedin.com:9000/user/dev_svc/lensreports/hdfsout/f20e5f20-3f09-4d32-a1b0-ac79d97e0f1a>"
>  SELECT ( dimension1 . primary_key ), ( dimension1 . attr5 ) FROM
> holdem_dimension1_division2 dimension1 WHERE ((((dimension1.dt =
> 'latest')))) ' and Driver handle: OperationHandle
> [opType=EXECUTE_STATEMENT,
> getHandleIdentifier()=6ed9819c-d05f-427f-9ec3-740644072ce9]*
>
> Based on the server log, it seems the first candidate dimtable is picked:
>
> *[estimate-11] INFO  org.apache.lens.cube.parse.CubeQueryContext -
> Available candidate dims are:[dimension1_division2,
> dimension1_division2_subset], picking up dimension1_division2 for querying.*
>
> Is it the expected behavior?  If that is the case, then, do we have plan
> to support this type of optimization?
>
> Thanks,
> --
>
> *Tao Yan*
> Software Engineer
> Data Analytics Infrastructure Tools and Services
>
>
>
> 206.250.5345
> tyan@linkedin.com
> https://www.linkedin.com/in/taousc
>

Mime
View raw message