spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aureliano Buendia <>
Subject Why does saveAfObjectFile() serialize Array[T] instead of T?
Date Sun, 05 Jan 2014 19:52:30 GMT

Given an RDD[T] instance, saveAfObjectFile() passes an instance of Array[T]
to serialize(), and not and instance of T:

  def saveAsObjectFile(path: String) {
    this.mapPartitions(iter => iter.grouped(10).map(_.toArray))
      .map(*x* => (NullWritable.get(), new BytesWritable(

Is this array mapping for efficiency reasons, or are there other reasons
for this?

I'm trying to use saveAfObjectFile() to serialize protobuf messages.
Protobuf messages already come with a method that turns them into
Array[Byte] (see
that is, toByteArray() can be clled on an instance of T. Considering this,
how can a protobuf message instance be serialized in a custom version of

Is it a good idea to drop array mapping?:

def saveAsObjectFile(path: String) { => (NullWritable.get(), new BytesWritable(*x.toByteArray()*

View raw message