lens-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajat Khandelwal <pro...@apache.org>
Subject Re: Lens Query Conversion
Date Wed, 10 Aug 2016 08:10:40 GMT
"value" is required in a "property" tag. A property tag is name and value.
This special property defines that whenever queried on order_time time
dimension, try to answer with facts that have "ot" as partition column. If
not, it falls back to delivery_time time dimension. There again, the
mapping is checked, which is "dt", so facts that have "dt" as partition
column are preferred. This can happen in a chain until a valid combination
is found which can answer the query.


On Wed, Aug 10, 2016 at 12:13 AM Tao Yan <tyan@linkedin.com> wrote:

> So, the field 'value' can be used to define both alias and real values?
>
> On Tue, Aug 9, 2016 at 11:40 AM, Rajat Khandelwal <prongs@apache.org>
> wrote:
>
>> Yes, that is correct.
>>
>> On Wed, Aug 10, 2016, 00:09 Tao Yan <tyan@linkedin.com> wrote:
>>
>>> Thanks for the explanation. I got the idea of tentative partitions.
>>>
>>> Is the property
>>> *<property name="cube.timedim.partition.order_time" value="ot" />*
>>>  referring to the dim_attribute
>>> *<dim_attribute name="order_time" _type="TIMESTAMP" /> *?
>>>
>>> Is the value 'ot' an alias of the partition column name?
>>>
>>>
>>> On Tue, Aug 9, 2016 at 12:47 AM, Rajat Khandelwal <prongs@apache.org>
>>> wrote:
>>>
>>>> The idea of partitions is that there is a stream of data that keeps
>>>> flowing in, each line of which has delivery time and order time both. The
>>>> data is partitioned and stored in partitions later and facts are created
on
>>>> top of that partitioned data. So if data is partitioned in order time and
>>>> the time range queried is also order time, then we can pick the partitions
>>>> directly. But if the data is partitioned on delivery time, as is the case
>>>> for this fact, and the query is on order time, then you pick some tentative
>>>> partitions of delivery time, but you still have to filter over order time.
>>>> Here the purpose of picking partitions is to reduce the amount of data
>>>> read. When you pick delivery time partitions, it doesn't guarantee what
>>>> order times data has, it only provides a tentative range. Hence, the
>>>> partitions are picked based on the relation of time dimensions, but the
>>>> extra filter still has to be supplied.
>>>>
>>>> Hope it's clear.
>>>>
>>>> On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <prongs@apache.org>
>>>> wrote:
>>>>
>>>>> That's the original time range asked in the cube query.
>>>>>
>>>>> On Tue, Aug 9, 2016, 01:46 Tao Yan <tyan@linkedin.com> wrote:
>>>>>
>>>>>> Hi Rajat,
>>>>>>
>>>>>> Thanks for the explain. I have a clear understanding about dim
>>>>>> relation now, however, how does the following condition generated:
>>>>>>
>>>>>> and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time)
< '2015-04-13 04:00:00')))
>>>>>>
>>>>>> order_time is a dim attributes defined in sales cube, and it does
not have any relationships with properties or other dim attributes (The property cube.timedim.partition.order_time
does, but is it the same thing?), I wonder how the query could pick up this column.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <prongs@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I've created a jira for this:
>>>>>>> https://issues.apache.org/jira/browse/LENS-1269
>>>>>>>
>>>>>>> You should be able to make change in your data locally and see
the
>>>>>>> non-empty results:
>>>>>>>
>>>>>>> cat
>>>>>>> /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-lens-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-bin/client/examples/data/sales_aggr_fact2_local3/sales-aggr-fact2-local3.data
>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <prongs@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> More details on how it converts to the fallback range:
>>>>>>>>
>>>>>>>> Range asked:
>>>>>>>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>>>>>>>
>>>>>>>> Timedim Relation :
>>>>>>>>
>>>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-20
days,-1 hour]" />
>>>>>>>>
>>>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-2
hours,-1hour]" />
>>>>>>>>
>>>>>>>> The cube defines the first relation, the fact defines the
second relation. A fact can optionally override the timedim relation defined by cube. So the
second one is picked :
>>>>>>>>
>>>>>>>> Derivation:
>>>>>>>>
>>>>>>>> 2015-04-13-03 <=ot < 2015-04-13-04
>>>>>>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour
<= dt <= ot + 2
>>>>>>>> hour
>>>>>>>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>>>>>>>
>>>>>>>> Hope it's clearer now.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <prongs@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>> So it seems right now the query is designed to return
0 rows. The
>>>>>>>>> query is there just to demonstrate the fallback feature.
>>>>>>>>>
>>>>>>>>> Cube query is
>>>>>>>>>
>>>>>>>>> cube select customer_city_name, store_cost from sales
where time_range_in(order_time,
>>>>>>>>> '2015-04-13-03', '2015-04-13-04')
>>>>>>>>> Driver query comes out to be
>>>>>>>>>
>>>>>>>>> SELECT (customer_city.name), sum((sales.store_cost))
FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id
= customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04')
or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and
((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (customer_city.name)
>>>>>>>>>
>>>>>>>>> Then I modified the driver query and ran:
>>>>>>>>>
>>>>>>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost
FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id
= customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04')
or ((sales.dt) = '2015-04-13-05'))))
>>>>>>>>>
>>>>>>>>> Found the following results:
>>>>>>>>>
>>>>>>>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>>>>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>>>>>>>
>>>>>>>>> It's apparent that both the rows will be filtered out
by the
>>>>>>>>> translated query.
>>>>>>>>>
>>>>>>>>> I'll look deeper and replace the example query with something
that
>>>>>>>>> demonstrates the fallback feature and also returns non-empty
results.
>>>>>>>>>
>>>>>>>>> Let us know if you have any confusion understanding the
feature
>>>>>>>>> itself. I think the video should have covered it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> *Tao Yan*
>>>>>> Software Engineer
>>>>>> Data Analytics Infrastructure Tools and Services
>>>>>>
>>>>>>
>>>>>>
>>>>>> 206.250.5345
>>>>>> tyan@linkedin.com
>>>>>> https://www.linkedin.com/in/taousc
>>>>>>
>>>>>
>>>
>>>
>>> --
>>>
>>> *Tao Yan*
>>> Software Engineer
>>> Data Analytics Infrastructure Tools and Services
>>>
>>>
>>>
>>> 206.250.5345
>>> tyan@linkedin.com
>>> https://www.linkedin.com/in/taousc
>>>
>>
>
>
> --
>
> *Tao Yan*
> Software Engineer
> Data Analytics Infrastructure Tools and Services
>
>
>
> 206.250.5345
> tyan@linkedin.com
> https://www.linkedin.com/in/taousc
>

Mime
View raw message