spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrés Ivaldi <iaiva...@gmail.com>
Subject Re: Aggregation Calculation
Date Fri, 04 Nov 2016 13:23:32 GMT
Ok, so I've read that rollup is just syntactic sugar of GROUPING SET(...),
in that case I just need to use GROUPNG SET, but the examples in the
documentation this GROUPING SET is used with SQL syntaxis and I am doing it
programmatically, so I need the DataSet api, like ds.rollup(..) but for
grouping set,

Does any one knows how to do it?

thanks.



On Thu, Nov 3, 2016 at 5:17 PM, Andrés Ivaldi <iaivaldi@gmail.com> wrote:

> I'm not sure about inline views, it will still performing aggregation that
> I don't need. I think I didn't explain right, I've already filtered the
> values that I need, the problem is that default calculation of rollUp give
> me some calculations that I don't want like only aggregation by the second
> column.
> Suppose tree columns (DataSet Columns) Year, Moth, Import, and I want
> aggregation sum(Import), and the combination of all Year/Month Sum(import),
> also Year Sum(import), but Mont Sum(import) doesn't care
>
> in table it will looks like
>
> YEAR | MOTH | Sum(Import)
> 2006 | 1    | xxxx
> 2005 | 1    | XXXX
> 2005 | 2    | xxxx
> 2006 | null | xxxx
> 2005 | null | xxxx
> null | null | xxxx
> null | 1    | xxxx
> null | 2    | xxxx
>
> the las tree rows are not needed, in this example I could perform
> filtering after rollUp i do the query by demand  so it will grow depending
> on number of rows and columns, and will be a lot of combinations that I
> don't need.
>
> thanks
>
>
>
>
>
> On Thu, Nov 3, 2016 at 4:04 PM, Stephen Boesch <javadba@gmail.com> wrote:
>
>> You would likely want to create inline views that perform the filtering *before
>> *performing t he cubes/rollup; in this way the cubes/rollups only
>> operate on the pruned rows/columns.
>>
>> 2016-11-03 11:29 GMT-07:00 Andrés Ivaldi <iaivaldi@gmail.com>:
>>
>>> Hello, I need to perform some aggregations and a kind of Cube/RollUp
>>> calculation
>>>
>>> Doing some test looks like Cube and RollUp performs aggregation over all
>>> posible columns combination, but I just need some specific columns
>>> combination.
>>>
>>> What I'm trying to do is like a dataTable where te first N columns are
>>> may rows and the second M values are my columns and the last columna are
>>> the aggregated values, like Dimension / Measures
>>>
>>> I need all the values of the N and M columns and the ones that
>>> correspond to the aggregation function. I'll never need the values that
>>> previous column has no value, ie
>>>
>>> having N=2 so two columns as rows I'll need
>>> R1 | R2  ....
>>> ##  |  ## ....
>>> ##  |   null ....
>>>
>>> but not
>>> null | ## ....
>>>
>>> as roll up does, same approach to M columns
>>>
>>>
>>> So the question is what could be the better way to perform this
>>> calculation.
>>> Using rollUp/Cube give me a lot of values that I dont need
>>> Using groupBy give me less information ( I could do several groupBy but
>>> that is not performant, I think )
>>> Is any other way to something like that?
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Ing. Ivaldi Andres
>>>
>>
>>
>
>
> --
> Ing. Ivaldi Andres
>



-- 
Ing. Ivaldi Andres

Mime
View raw message