spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akshat Aranya <aara...@gmail.com>
Subject Primitive arrays in Spark
Date Tue, 21 Oct 2014 20:08:36 GMT
This is as much of a Scala question as a Spark question

I have an RDD:

val rdd1: RDD[(Long, Array[Long])]

This RDD has duplicate keys that I can collapse such

val rdd2: RDD[(Long, Array[Long])] = rdd1.reduceByKey((a,b) => a++b)

If I start with an Array of primitive longs in rdd1, will rdd2 also have
Arrays of primitive longs?  I suspect, based on my memory usage, that this
is not the case.

Also, would it be more efficient to do this:

val rdd1: RDD[(Long, ArrayBuffer[Long])]

and then

val rdd2: RDD[(Long, Array[Long])] = rdd1.reduceByKey((a,b) =>
a++b).map(_.toArray)

Mime
View raw message