spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <bur...@spinn3r.com>
Subject Re: saveAsObjectFile is actually saveAsSequenceFile
Date Tue, 13 Jan 2015 22:43:10 GMT
Yes.. but this isn’t what the main documentation says.

The file format isn’t very discoverable..

Also, the documentation doesn’t say anything about the group by 10.. what’s
that about?

Kevin

On Tue, Jan 13, 2015 at 2:28 AM, Sean Owen <sowen@cloudera.com> wrote:

> Yes, that's even what the objectFile javadoc says. It is expecting a
> SequenceFile with NullWritable keys and BytesWritable values containing the
> serialized values. This looks correct to me.
>
> On Tue, Jan 13, 2015 at 8:39 AM, Kevin Burton <burton@spinn3r.com> wrote:
>
>> This is interesting.
>>
>> I’m using ObjectInputStream to try to read a file written as
>> saveAsObjectFile… but it’s not working.
>>
>> The documentation says:
>>
>> "Write the elements of the dataset in a simple format using Java
>> serialization, which can then be loaded using SparkContext.objectFile().”
>>
>> … but that’s not right.
>>
>>   def saveAsObjectFile(path: String) {
>>     this.mapPartitions(iter => iter.grouped(10).map(_.toArray))
>>       .map(x => (NullWritable.get(), new
>> BytesWritable(Utils.serialize(x))))
>>       .saveAsSequenceFile(path)
>>   }
>>
>> .. am I correct to assume that each entry is a serialized object BUT that
>> the entire thing is wrapped as a sequence file?
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com>
>>
>>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Mime
View raw message