spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <>
Subject saveAsObjectFile is actually saveAsSequenceFile
Date Tue, 13 Jan 2015 08:39:21 GMT
This is interesting.

I’m using ObjectInputStream to try to read a file written as
saveAsObjectFile… but it’s not working.

The documentation says:

"Write the elements of the dataset in a simple format using Java
serialization, which can then be loaded using SparkContext.objectFile().”

… but that’s not right.

  def saveAsObjectFile(path: String) {
    this.mapPartitions(iter => iter.grouped(10).map(_.toArray))
      .map(x => (NullWritable.get(), new BytesWritable(Utils.serialize(x))))

.. am I correct to assume that each entry is a serialized object BUT that
the entire thing is wrapped as a sequence file?


Location: *San Francisco, CA*
… or check out my Google+ profile

View raw message