lens-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tao Yan <t...@linkedin.com>
Subject Re: Narrow Table Support
Date Fri, 26 Aug 2016 18:30:22 GMT
It works after I changed the weight. Thanks!

On Fri, Aug 26, 2016 at 3:46 AM, Rajat Khandelwal <rajatgupta59@gmail.com>
wrote:

> Can you try setting the weight of the narrow table less than the weight of
> the other table? I see they are both set to same value: 10.
>
> On Fri, Aug 26, 2016 at 4:04 AM Tao Yan <tyan@linkedin.com> wrote:
>
>> Hi Lens Developers,
>>
>> I am testing 'narrow table' with Lens. If table1 and table2 has the same
>> granularity and table2's columns is a subset of table1, then table2 is a
>> narrow table of table1.
>>
>> I defined a dimension dimension1:
>> *dimension1.xml:*
>> *<x_dimension name="dimension1" xmlns="uri:lens:cube:0.1"
>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
>> <http://www.w3.org/2001/XMLSchema-instance>"*
>> *  xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">*
>> *  <attributes>*
>> *    <dim_attribute name="primary_key" _type="BIGINT"/>*
>> *    <dim_attribute name="attr1" _type="BIGINT"/>*
>> *    <dim_attribute name="attr2" _type="BIGINT"/>*
>> *    <dim_attribute name="attr3" _type="BIGINT"/>*
>> *    <dim_attribute name="attr4" _type="BIGINT"/>*
>> *    <dim_attribute name="attr5" _type="BIGINT"/>*
>> *    <dim_attribute name="attr6" _type="BIGINT"/>*
>> *    <dim_attribute name="attr7" _type="BIGINT"/>*
>> *    <dim_attribute name="attr8" _type="BIGINT"/>*
>> *  </attributes>*
>> *  <properties>*
>> *    <property name="dimension.dimension1.timed.dimension" value="dt"/>*
>> *  </properties>*
>> *</x_dimension>*
>>
>> And defined two dimtables under this dimension:
>> *dimension1_division2.xml*
>> <x_dimension_table dimension_name="dimension1" table_name="dimension1_division2"
>> weight="10.0" xmlns="uri:lens:cube:0.1"
>>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
>>   <columns>
>>     <column name="primary_key" _type="BIGINT"/>
>> *    <column name="attr5" _type="BIGINT"/>*
>> *    <column name="attr6" _type="BIGINT"/>*
>> *    <column name="attr7" _type="BIGINT"/>*
>> *    <column name="attr8" _type="BIGINT"/>*
>>   </columns>
>>   <properties>
>>     <property name="dimension1.prop" value="d2"/>
>>   </properties>
>>   <storage_tables>
>> ...
>>   </storage_tables>
>> </x_dimension_table>
>>
>> *dimension1_division2_subset.xml*
>> <x_dimension_table dimension_name="dimension1" table_name="dimension1_division2_subset"
>> weight="10.0" xmlns="uri:lens:cube:0.1"
>>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
>>   <columns>
>>     <column name="primary_key" _type="BIGINT"/>
>> *    <column name="attr5" _type="BIGINT"/>*
>> *    <column name="attr6" _type="BIGINT"/>*
>>   </columns>
>>   <properties>
>>     <property name="dimension1.prop" value="d2_subset"/>
>>   </properties>
>>   <storage_tables>
>> ...
>>   </storage_tables>
>> </x_dimension_table>
>>
>> Then, I run the query again attr5:
>> *lens-shell>select primary_key, attr5 from dimension1*
>>
>> I expect it will use the table *dimension1_division2_subset *since it is
>> a narrow table and hence will be more efficient. However, it picked up the
>> table *dimension1_division2:*
>>
>> *25 Aug 2016 22:18:14 [Spring Shell] INFO  cliLogger -  Driver query:
>> 'INSERT OVERWRITE DIRECTORY
>> "hdfs://ltx1-holdemnn01.grid.linkedin.com:9000/user/dev_svc/lensreports/hdfsout/f20e5f20-3f09-4d32-a1b0-ac79d97e0f1a
>> <http://ltx1-holdemnn01.grid.linkedin.com:9000/user/dev_svc/lensreports/hdfsout/f20e5f20-3f09-4d32-a1b0-ac79d97e0f1a>"
>>  SELECT ( dimension1 . primary_key ), ( dimension1 . attr5 ) FROM
>> holdem_dimension1_division2 dimension1 WHERE ((((dimension1.dt =
>> 'latest')))) ' and Driver handle: OperationHandle
>> [opType=EXECUTE_STATEMENT,
>> getHandleIdentifier()=6ed9819c-d05f-427f-9ec3-740644072ce9]*
>>
>> Based on the server log, it seems the first candidate dimtable is picked:
>>
>> *[estimate-11] INFO  org.apache.lens.cube.parse.CubeQueryContext -
>> Available candidate dims are:[dimension1_division2,
>> dimension1_division2_subset], picking up dimension1_division2 for querying.*
>>
>> Is it the expected behavior?  If that is the case, then, do we have plan
>> to support this type of optimization?
>>
>> Thanks,
>> --
>>
>> *Tao Yan*
>> Software Engineer
>> Data Analytics Infrastructure Tools and Services
>>
>>
>>
>> 206.250.5345
>> tyan@linkedin.com
>> https://www.linkedin.com/in/taousc
>>
>


-- 

*Tao Yan*
Software Engineer
Data Analytics Infrastructure Tools and Services



206.250.5345
tyan@linkedin.com
https://www.linkedin.com/in/taousc

Mime
View raw message