spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From talgr <tal.grynb...@gmail.com>
Subject dense_rank skips ranks on cube
Date Mon, 20 Jun 2016 14:00:28 GMT
I have a dataframe with 7 dimensions,
I built a cube on them

val cube = df.cube('d1,'d2,'d3,'d4,'d5,'d6,'d7)
val cc = cube.agg(sum('p1).as("p1"),sum('p2).as("p2")).cache

and then defined a rank function on a window:

 val rankSpec =
Window.partitionBy('d1,'d2,'d3,'d4,'d5,'d6).orderBy('p1.desc)
 val grank = dense_rank().over(rankSpec)
 val cubed = cc.withColumn("rank",grank)

when I do: 
cubed.filter('d1.isNull && 'd2.isNull && 'd3.isNull && 'd4.isNull
&&
'd5.isNull && 'd6.isNull && 'd7.isNotNull).sort('rank).show

i see that the first ranks are 3,5,9,10,11,12,13,15...

it seems that they becomes more dense on higher ranks.
Any idea?

Thanks
Tal



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/dense-rank-skips-ranks-on-cube-tp27196.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message