spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrés Ivaldi <>
Subject Aggregation Calculation
Date Thu, 03 Nov 2016 18:29:36 GMT
Hello, I need to perform some aggregations and a kind of Cube/RollUp

Doing some test looks like Cube and RollUp performs aggregation over all
posible columns combination, but I just need some specific columns

What I'm trying to do is like a dataTable where te first N columns are may
rows and the second M values are my columns and the last columna are the
aggregated values, like Dimension / Measures

I need all the values of the N and M columns and the ones that correspond
to the aggregation function. I'll never need the values that previous
column has no value, ie

having N=2 so two columns as rows I'll need
R1 | R2  ....
##  |  ## ....
##  |   null ....

but not
null | ## ....

as roll up does, same approach to M columns

So the question is what could be the better way to perform this calculation.
Using rollUp/Cube give me a lot of values that I dont need
Using groupBy give me less information ( I could do several groupBy but
that is not performant, I think )
Is any other way to something like that?


Ing. Ivaldi Andres

View raw message