spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrés Ivaldi <iaiva...@gmail.com>
Subject Re: Aggregation Calculation
Date Thu, 03 Nov 2016 20:17:58 GMT
I'm not sure about inline views, it will still performing aggregation that
I don't need. I think I didn't explain right, I've already filtered the
values that I need, the problem is that default calculation of rollUp give
me some calculations that I don't want like only aggregation by the second
column.
Suppose tree columns (DataSet Columns) Year, Moth, Import, and I want
aggregation sum(Import), and the combination of all Year/Month Sum(import),
also Year Sum(import), but Mont Sum(import) doesn't care

in table it will looks like

YEAR | MOTH | Sum(Import)
2006 | 1    | xxxx
2005 | 1    | XXXX
2005 | 2    | xxxx
2006 | null | xxxx
2005 | null | xxxx
null | null | xxxx
null | 1    | xxxx
null | 2    | xxxx

the las tree rows are not needed, in this example I could perform filtering
after rollUp i do the query by demand  so it will grow depending on number
of rows and columns, and will be a lot of combinations that I don't need.

thanks





On Thu, Nov 3, 2016 at 4:04 PM, Stephen Boesch <javadba@gmail.com> wrote:

> You would likely want to create inline views that perform the filtering *before
> *performing t he cubes/rollup; in this way the cubes/rollups only operate
> on the pruned rows/columns.
>
> 2016-11-03 11:29 GMT-07:00 Andrés Ivaldi <iaivaldi@gmail.com>:
>
>> Hello, I need to perform some aggregations and a kind of Cube/RollUp
>> calculation
>>
>> Doing some test looks like Cube and RollUp performs aggregation over all
>> posible columns combination, but I just need some specific columns
>> combination.
>>
>> What I'm trying to do is like a dataTable where te first N columns are
>> may rows and the second M values are my columns and the last columna are
>> the aggregated values, like Dimension / Measures
>>
>> I need all the values of the N and M columns and the ones that correspond
>> to the aggregation function. I'll never need the values that previous
>> column has no value, ie
>>
>> having N=2 so two columns as rows I'll need
>> R1 | R2  ....
>> ##  |  ## ....
>> ##  |   null ....
>>
>> but not
>> null | ## ....
>>
>> as roll up does, same approach to M columns
>>
>>
>> So the question is what could be the better way to perform this
>> calculation.
>> Using rollUp/Cube give me a lot of values that I dont need
>> Using groupBy give me less information ( I could do several groupBy but
>> that is not performant, I think )
>> Is any other way to something like that?
>>
>> Thanks.
>>
>>
>>
>>
>>
>> --
>> Ing. Ivaldi Andres
>>
>
>


-- 
Ing. Ivaldi Andres

Mime
View raw message