lens-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "amareshwarisr ." <amareshw...@gmail.com>
Subject Re: Lens Query Conversion
Date Thu, 11 Aug 2016 04:05:08 GMT
Dayou Jiang,

Seems a bug. Please file a jira at
https://issues.apache.org/jira/browse/LENS.

Thanks

On Thu, Aug 11, 2016 at 12:34 AM, Dayou Jiang <feltjd@gmail.com> wrote:

> Hi,
>
> I would like to add something in this partition discussion:
>
> I found out that Lens doesn't allow me to use the "update fact" command to
> change the partition update periods. For example, i have the "<
> update_period>DAILY</update_period>" tag in one of my fact table
> definition and i now decide to have the
> MONTHLY partition too, but i couldn't update my fact table to accept that.
> Even after i dropped the entire fact table
> and tried to create it again with both "DAILY" and "MONTHLY" as the update
> periods i still couldn't add the MONTHLY partition.
> I think some of the meta data still got hanging around and not being
> deleted after i dropped the table.
> Is this a bug at Lens?
>
>
> Thanks,
> Dayou Jiang
>
> On Wed, Aug 10, 2016 at 11:36 AM, Tao Yan <tyan@linkedin.com> wrote:
>
>> Thanks for answering the question! It is helpful.
>>
>> On Wed, Aug 10, 2016 at 1:15 AM, amareshwarisr . <amareshwari@gmail.com>
>> wrote:
>>
>>> We have a mapping time dimension columns to its partition columns,
>>> because system's like HIVE would require partition column to be separate
>>> column than table columns.
>>>
>>> So, if tables already contain order_time as a column, and the data is
>>> partitioned by order_time - then partition column name has to be a separate
>>> column - which is defined through the mapping.
>>>
>>> Thanks
>>>
>>> On Wed, Aug 10, 2016 at 1:40 PM, Rajat Khandelwal <prongs@apache.org>
>>> wrote:
>>>
>>>> "value" is required in a "property" tag. A property tag is name and
>>>> value. This special property defines that whenever queried on order_time
>>>> time dimension, try to answer with facts that have "ot" as
>>>> partition column. If not, it falls back to delivery_time time dimension.
>>>> There again, the mapping is checked, which is "dt", so facts that have "dt"
>>>> as partition column are preferred. This can happen in a chain until a valid
>>>> combination is found which can answer the query.
>>>>
>>>>
>>>> On Wed, Aug 10, 2016 at 12:13 AM Tao Yan <tyan@linkedin.com> wrote:
>>>>
>>>>> So, the field 'value' can be used to define both alias and real values?
>>>>>
>>>>> On Tue, Aug 9, 2016 at 11:40 AM, Rajat Khandelwal <prongs@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Yes, that is correct.
>>>>>>
>>>>>> On Wed, Aug 10, 2016, 00:09 Tao Yan <tyan@linkedin.com> wrote:
>>>>>>
>>>>>>> Thanks for the explanation. I got the idea of tentative partitions.
>>>>>>>
>>>>>>> Is the property
>>>>>>> *<property name="cube.timedim.partition.order_time" value="ot"
/>*
>>>>>>>  referring to the dim_attribute
>>>>>>> *<dim_attribute name="order_time" _type="TIMESTAMP" />
*?
>>>>>>>
>>>>>>> Is the value 'ot' an alias of the partition column name?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 9, 2016 at 12:47 AM, Rajat Khandelwal <prongs@apache.org
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> The idea of partitions is that there is a stream of data
that keeps
>>>>>>>> flowing in, each line of which has delivery time and order
time both. The
>>>>>>>> data is partitioned and stored in partitions later and facts
are created on
>>>>>>>> top of that partitioned data. So if data is partitioned in
order time and
>>>>>>>> the time range queried is also order time, then we can pick
the partitions
>>>>>>>> directly. But if the data is partitioned on delivery time,
as is the case
>>>>>>>> for this fact, and the query is on order time, then you pick
some tentative
>>>>>>>> partitions of delivery time, but you still have to filter
over order time.
>>>>>>>> Here the purpose of picking partitions is to reduce the amount
of data
>>>>>>>> read. When you pick delivery time partitions, it doesn't
guarantee what
>>>>>>>> order times data has, it only provides a tentative range.
Hence, the
>>>>>>>> partitions are picked based on the relation of time dimensions,
but the
>>>>>>>> extra filter still has to be supplied.
>>>>>>>>
>>>>>>>> Hope it's clear.
>>>>>>>>
>>>>>>>> On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <prongs@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> That's the original time range asked in the cube query.
>>>>>>>>>
>>>>>>>>> On Tue, Aug 9, 2016, 01:46 Tao Yan <tyan@linkedin.com>
wrote:
>>>>>>>>>
>>>>>>>>>> Hi Rajat,
>>>>>>>>>>
>>>>>>>>>> Thanks for the explain. I have a clear understanding
about dim
>>>>>>>>>> relation now, however, how does the following condition
generated:
>>>>>>>>>>
>>>>>>>>>> and ((sales.order_time) >= '2015-04-13 03:00:00')
and ((sales.order_time) < '2015-04-13 04:00:00')))
>>>>>>>>>>
>>>>>>>>>> order_time is a dim attributes defined in sales cube,
and it does not have any relationships with properties or other dim attributes (The property
cube.timedim.partition.order_time does, but is it the same thing?), I wonder how the query
could pick up this column.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal
<
>>>>>>>>>> prongs@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> I've created a jira for this: https://issues.apache.or
>>>>>>>>>>> g/jira/browse/LENS-1269
>>>>>>>>>>>
>>>>>>>>>>> You should be able to make change in your data
locally and see
>>>>>>>>>>> the non-empty results:
>>>>>>>>>>>
>>>>>>>>>>> cat /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-len
>>>>>>>>>>> s-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-bin/client/e
>>>>>>>>>>> xamples/data/sales_aggr_fact2_local3/sales-aggr-fact2-local3
>>>>>>>>>>> .data
>>>>>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
>>>>>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal
<
>>>>>>>>>>> prongs@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> More details on how it converts to the fallback
range:
>>>>>>>>>>>>
>>>>>>>>>>>> Range asked:
>>>>>>>>>>>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>>>>>>>>>>>
>>>>>>>>>>>> Timedim Relation :
>>>>>>>>>>>>
>>>>>>>>>>>> <property name="cube.timedim.relation.order_time"
value="delivery_time+[-20 days,-1 hour]" />
>>>>>>>>>>>>
>>>>>>>>>>>> <property name="cube.timedim.relation.order_time"
value="delivery_time+[-2 hours,-1hour]" />
>>>>>>>>>>>>
>>>>>>>>>>>> The cube defines the first relation, the
fact defines the second relation. A fact can optionally override the timedim relation defined
by cube. So the second one is picked :
>>>>>>>>>>>>
>>>>>>>>>>>> Derivation:
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-04-13-03 <=ot < 2015-04-13-04
>>>>>>>>>>>> dt - 2 hours <= ot <= dt - 1 hour =>
ot + 1 hour <= dt <= ot +
>>>>>>>>>>>> 2 hour
>>>>>>>>>>>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>>>>>>>>>>>
>>>>>>>>>>>> Hope it's clearer now.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal
<
>>>>>>>>>>>> prongs@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>
>>>>>>>>>>>>> So it seems right now the query is designed
to return 0 rows.
>>>>>>>>>>>>> The query is there just to demonstrate
the fallback feature.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cube query is
>>>>>>>>>>>>>
>>>>>>>>>>>>> cube select customer_city_name, store_cost
from sales where time_range_in(order_time,
>>>>>>>>>>>>> '2015-04-13-03', '2015-04-13-04')
>>>>>>>>>>>>> Driver query comes out to be
>>>>>>>>>>>>>
>>>>>>>>>>>>> SELECT (customer_city.name), sum((sales.store_cost))
FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id
= customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04')
or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and
((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (customer_city.name)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then I modified the driver query and
ran:
>>>>>>>>>>>>>
>>>>>>>>>>>>> SELECT (customer_city.name), sales.order_time,
sales.store_cost FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city
on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt)
= '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
>>>>>>>>>>>>>
>>>>>>>>>>>>> Found the following results:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>>>>>>>>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's apparent that both the rows will
be filtered out by the
>>>>>>>>>>>>> translated query.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'll look deeper and replace the example
query with something
>>>>>>>>>>>>> that demonstrates the fallback feature
and also returns non-empty results.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let us know if you have any confusion
understanding the
>>>>>>>>>>>>> feature itself. I think the video should
have covered it.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> *Tao Yan*
>>>>>>>>>> Software Engineer
>>>>>>>>>> Data Analytics Infrastructure Tools and Services
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 206.250.5345
>>>>>>>>>> tyan@linkedin.com
>>>>>>>>>> https://www.linkedin.com/in/taousc
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> *Tao Yan*
>>>>>>> Software Engineer
>>>>>>> Data Analytics Infrastructure Tools and Services
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 206.250.5345
>>>>>>> tyan@linkedin.com
>>>>>>> https://www.linkedin.com/in/taousc
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *Tao Yan*
>>>>> Software Engineer
>>>>> Data Analytics Infrastructure Tools and Services
>>>>>
>>>>>
>>>>>
>>>>> 206.250.5345
>>>>> tyan@linkedin.com
>>>>> https://www.linkedin.com/in/taousc
>>>>>
>>>>
>>>
>>
>>
>> --
>>
>> *Tao Yan*
>> Software Engineer
>> Data Analytics Infrastructure Tools and Services
>>
>>
>>
>> 206.250.5345
>> tyan@linkedin.com
>> https://www.linkedin.com/in/taousc
>>
>
>

Mime
View raw message