lens-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tao Yan <t...@linkedin.com>
Subject Re: Lens Query Conversion
Date Tue, 09 Aug 2016 18:39:24 GMT
Thanks for the explanation. I got the idea of tentative partitions.

Is the property
*<property name="cube.timedim.partition.order_time" value="ot" />*
 referring to the dim_attribute
*<dim_attribute name="order_time" _type="TIMESTAMP" /> *?

Is the value 'ot' an alias of the partition column name?


On Tue, Aug 9, 2016 at 12:47 AM, Rajat Khandelwal <prongs@apache.org> wrote:

> The idea of partitions is that there is a stream of data that keeps
> flowing in, each line of which has delivery time and order time both. The
> data is partitioned and stored in partitions later and facts are created on
> top of that partitioned data. So if data is partitioned in order time and
> the time range queried is also order time, then we can pick the partitions
> directly. But if the data is partitioned on delivery time, as is the case
> for this fact, and the query is on order time, then you pick some tentative
> partitions of delivery time, but you still have to filter over order time.
> Here the purpose of picking partitions is to reduce the amount of data
> read. When you pick delivery time partitions, it doesn't guarantee what
> order times data has, it only provides a tentative range. Hence, the
> partitions are picked based on the relation of time dimensions, but the
> extra filter still has to be supplied.
>
> Hope it's clear.
>
> On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <prongs@apache.org>
> wrote:
>
>> That's the original time range asked in the cube query.
>>
>> On Tue, Aug 9, 2016, 01:46 Tao Yan <tyan@linkedin.com> wrote:
>>
>>> Hi Rajat,
>>>
>>> Thanks for the explain. I have a clear understanding about dim relation
>>> now, however, how does the following condition generated:
>>>
>>> and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time)
< '2015-04-13 04:00:00')))
>>>
>>> order_time is a dim attributes defined in sales cube, and it does not have any
relationships with properties or other dim attributes (The property cube.timedim.partition.order_time
does, but is it the same thing?), I wonder how the query could pick up this column.
>>>
>>>
>>> Thanks,
>>>
>>>
>>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <prongs@apache.org>
>>> wrote:
>>>
>>>> I've created a jira for this: https://issues.apache.
>>>> org/jira/browse/LENS-1269
>>>>
>>>> You should be able to make change in your data locally and see the
>>>> non-empty results:
>>>>
>>>> cat /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-
>>>> lens-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-
>>>> bin/client/examples/data/sales_aggr_fact2_local3/sales-
>>>> aggr-fact2-local3.data
>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>>>>
>>>>
>>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <prongs@apache.org>
>>>> wrote:
>>>>
>>>>> More details on how it converts to the fallback range:
>>>>>
>>>>> Range asked:
>>>>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>>>>
>>>>> Timedim Relation :
>>>>>
>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-20
days,-1 hour]" />
>>>>>
>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-2
hours,-1hour]" />
>>>>>
>>>>> The cube defines the first relation, the fact defines the second relation.
A fact can optionally override the timedim relation defined by cube. So the second one is
picked :
>>>>>
>>>>> Derivation:
>>>>>
>>>>> 2015-04-13-03 <=ot < 2015-04-13-04
>>>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <=
ot + 2 hour
>>>>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>>>>
>>>>> Hope it's clearer now.
>>>>>
>>>>>
>>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <prongs@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> So it seems right now the query is designed to return 0 rows. The
>>>>>> query is there just to demonstrate the fallback feature.
>>>>>>
>>>>>> Cube query is
>>>>>>
>>>>>> cube select customer_city_name, store_cost from sales where time_range_in(order_time,
>>>>>> '2015-04-13-03', '2015-04-13-04')
>>>>>> Driver query comes out to be
>>>>>>
>>>>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM a.local_sales_aggr_fact2
sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and
(customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))
and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13
04:00:00'))) GROUP BY (customer_city.name)
>>>>>>
>>>>>> Then I modified the driver query and ran:
>>>>>>
>>>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM
a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id
= customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04')
or ((sales.dt) = '2015-04-13-05'))))
>>>>>>
>>>>>> Found the following results:
>>>>>>
>>>>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>>>>
>>>>>> It's apparent that both the rows will be filtered out by the
>>>>>> translated query.
>>>>>>
>>>>>> I'll look deeper and replace the example query with something that
>>>>>> demonstrates the fallback feature and also returns non-empty results.
>>>>>>
>>>>>> Let us know if you have any confusion understanding the feature
>>>>>> itself. I think the video should have covered it.
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>
>>>
>>>
>>> --
>>>
>>> *Tao Yan*
>>> Software Engineer
>>> Data Analytics Infrastructure Tools and Services
>>>
>>>
>>>
>>> 206.250.5345
>>> tyan@linkedin.com
>>> https://www.linkedin.com/in/taousc
>>>
>>


-- 

*Tao Yan*
Software Engineer
Data Analytics Infrastructure Tools and Services



206.250.5345
tyan@linkedin.com
https://www.linkedin.com/in/taousc

Mime
View raw message