lens-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajat Khandelwal <pro...@apache.org>
Subject Re: Lens Query Conversion
Date Tue, 09 Aug 2016 07:47:26 GMT
The idea of partitions is that there is a stream of data that keeps flowing
in, each line of which has delivery time and order time both. The data is
partitioned and stored in partitions later and facts are created on top of
that partitioned data. So if data is partitioned in order time and the time
range queried is also order time, then we can pick the partitions directly.
But if the data is partitioned on delivery time, as is the case for this
fact, and the query is on order time, then you pick some tentative
partitions of delivery time, but you still have to filter over order time.
Here the purpose of picking partitions is to reduce the amount of data
read. When you pick delivery time partitions, it doesn't guarantee what
order times data has, it only provides a tentative range. Hence, the
partitions are picked based on the relation of time dimensions, but the
extra filter still has to be supplied.

Hope it's clear.

On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <prongs@apache.org> wrote:

> That's the original time range asked in the cube query.
>
> On Tue, Aug 9, 2016, 01:46 Tao Yan <tyan@linkedin.com> wrote:
>
>> Hi Rajat,
>>
>> Thanks for the explain. I have a clear understanding about dim relation
>> now, however, how does the following condition generated:
>>
>> and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) <
'2015-04-13 04:00:00')))
>>
>> order_time is a dim attributes defined in sales cube, and it does not have any relationships
with properties or other dim attributes (The property cube.timedim.partition.order_time does,
but is it the same thing?), I wonder how the query could pick up this column.
>>
>>
>> Thanks,
>>
>>
>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <prongs@apache.org>
>> wrote:
>>
>>> I've created a jira for this:
>>> https://issues.apache.org/jira/browse/LENS-1269
>>>
>>> You should be able to make change in your data locally and see the
>>> non-empty results:
>>>
>>> cat
>>> /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-lens-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-bin/client/examples/data/sales_aggr_fact2_local3/sales-aggr-fact2-local3.data
>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>>>
>>>
>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <prongs@apache.org>
>>> wrote:
>>>
>>>> More details on how it converts to the fallback range:
>>>>
>>>> Range asked:
>>>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>>>
>>>> Timedim Relation :
>>>>
>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-20
days,-1 hour]" />
>>>>
>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-2
hours,-1hour]" />
>>>>
>>>> The cube defines the first relation, the fact defines the second relation.
A fact can optionally override the timedim relation defined by cube. So the second one is
picked :
>>>>
>>>> Derivation:
>>>>
>>>> 2015-04-13-03 <=ot < 2015-04-13-04
>>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <=
ot + 2 hour
>>>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>>>
>>>> Hope it's clearer now.
>>>>
>>>>
>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <prongs@apache.org>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> So it seems right now the query is designed to return 0 rows. The
>>>>> query is there just to demonstrate the fallback feature.
>>>>>
>>>>> Cube query is
>>>>>
>>>>> cube select customer_city_name, store_cost from sales where time_range_in(order_time,
>>>>> '2015-04-13-03', '2015-04-13-04')
>>>>> Driver query comes out to be
>>>>>
>>>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM a.local_sales_aggr_fact2
sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and
(customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))
and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13
04:00:00'))) GROUP BY (customer_city.name)
>>>>>
>>>>> Then I modified the driver query and ran:
>>>>>
>>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM
a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id
= customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04')
or ((sales.dt) = '2015-04-13-05'))))
>>>>>
>>>>> Found the following results:
>>>>>
>>>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>>>
>>>>> It's apparent that both the rows will be filtered out by the
>>>>> translated query.
>>>>>
>>>>> I'll look deeper and replace the example query with something that
>>>>> demonstrates the fallback feature and also returns non-empty results.
>>>>>
>>>>> Let us know if you have any confusion understanding the feature
>>>>> itself. I think the video should have covered it.
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>
>>
>>
>> --
>>
>> *Tao Yan*
>> Software Engineer
>> Data Analytics Infrastructure Tools and Services
>>
>>
>>
>> 206.250.5345
>> tyan@linkedin.com
>> https://www.linkedin.com/in/taousc
>>
>

Mime
View raw message