The idea of partitions is that there is a stream of data that keeps flowing in, each line of which has delivery time and order time both. The data is partitioned and stored in partitions later and facts are created on top of that partitioned data. So if data is partitioned in order time and the time range queried is also order time, then we can pick the partitions directly. But if the data is partitioned on delivery time, as is the case for this fact, and the query is on order time, then you pick some tentative partitions of delivery time, but you still have to filter over order time. Here the purpose of picking partitions is to reduce the amount of data read. When you pick delivery time partitions, it doesn't guarantee what order times data has, it only provides a tentative range. Hence, the partitions are picked based on the relation of time dimensions, but the extra filter still has to be supplied. 

Hope it's clear.  

On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <> wrote:

That's the original time range asked in the cube query.

On Tue, Aug 9, 2016, 01:46 Tao Yan <> wrote:
Hi Rajat,

Thanks for the explain. I have a clear understanding about dim relation now, however, how does the following condition generated:

and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00'))) 
order_time is a dim attributes defined in sales cube, and it does not have any relationships with properties or other dim attributes (The property cube.timedim.partition.order_time does, but is it the same thing?), I wonder how the query could pick up this column.


On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <> wrote:
I've created a jira for this:

You should be able to make change in your data locally and see the non-empty results: 

cat /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-lens-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-bin/client/examples/data/sales_aggr_fact2_local3/
2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2

On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <> wrote:
More details on how it converts to the fallback range: 

Range asked: 
(order_time, '2015-04-13-03', '2015-04-13-04')

Timedim Relation : 
<property name="cube.timedim.relation.order_time" value="delivery_time+[-20 days,-1 hour]" />
<property name="cube.timedim.relation.order_time" value="delivery_time+[-2 hours,-1hour]" />
The cube defines the first relation, the fact defines the second relation. A fact can optionally override the timedim relation defined by cube. So the second one is picked : 

2015-04-13-03 <=ot < 2015-04-13-04
dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot + 2 hour
=> 2015-04-13-04 <= dt < 2015-04-13-06. 

Hope it's clearer now. 

On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <> wrote:

So it seems right now the query is designed to return 0 rows. The query is there just to demonstrate the fallback feature. 

Cube query is 

cube select customer_city_name, store_cost from sales where time_range_in(order_time, '2015-04-13-03', '2015-04-13-04')
Driver query comes out to be
SELECT (, sum((sales.store_cost)) FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (
Then I modified the driver query and ran: 
SELECT (, sales.order_time, sales.store_cost FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
Found the following results:

Bangalore^A2015-04-13 00:00:00^A0.0
Hyderabad^A2015-04-13 00:00:00^A2.0

It's apparent that both the rows will be filtered out by the translated query. 

I'll look deeper and replace the example query with something that demonstrates the fallback feature and also returns non-empty results. 

Let us know if you have any confusion understanding the feature itself. I think the video should have covered it. 



Tao Yan
Software Engineer
Data Analytics Infrastructure Tools and Services