spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: saveAsObjectFile is actually saveAsSequenceFile
Date Tue, 13 Jan 2015 10:28:08 GMT
Yes, that's even what the objectFile javadoc says. It is expecting a
SequenceFile with NullWritable keys and BytesWritable values containing the
serialized values. This looks correct to me.

On Tue, Jan 13, 2015 at 8:39 AM, Kevin Burton <burton@spinn3r.com> wrote:

> This is interesting.
>
> I’m using ObjectInputStream to try to read a file written as
> saveAsObjectFile… but it’s not working.
>
> The documentation says:
>
> "Write the elements of the dataset in a simple format using Java
> serialization, which can then be loaded using SparkContext.objectFile().”
>
> … but that’s not right.
>
>   def saveAsObjectFile(path: String) {
>     this.mapPartitions(iter => iter.grouped(10).map(_.toArray))
>       .map(x => (NullWritable.get(), new
> BytesWritable(Utils.serialize(x))))
>       .saveAsSequenceFile(path)
>   }
>
> .. am I correct to assume that each entry is a serialized object BUT that
> the entire thing is wrapped as a sequence file?
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>

Mime
View raw message