spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shay Seng <>
Subject How would I start writing a RDD[ProtoBuf] and/or sc.newAPIHadoopFile??
Date Wed, 09 Oct 2013 01:16:39 GMT

I would like to store some data as a seq of protobuf objects. I would of
course need to beable to read that into an RDD and write the RDD back out
in some binary format.

First of all, is this supported natively (or through some download)?

If not, are there examples on how I might write my own RDDs? I was hoping I
would be able to accomplish this using some invokation of
sparkContext.newAPIHadoopFile , but the comments there are just too terse.
Are there more verbose examples out there? Either on how to write new RDD
inputFormats, or how to make use of newAPIHadoopFile


View raw message