spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Best way to store RDD data?
Date Thu, 20 Nov 2014 20:04:06 GMT
Which SchemaRDD you can save out case classes to parquet (or JSON in Spark
1.2) automatically and when you read it back in the structure will be
preserved.  However, you won't get case classes when its loaded back,
instead you'll get rows that you can query.

There is some experimental support for turning rows back into case classes
using macros, but its very rough:  https://github.com/marmbrus/sql-typed

On Thu, Nov 20, 2014 at 7:35 AM, RJ Nowling <rnowling@gmail.com> wrote:

> Hi all,
>
> I'm working on an application that has several tables (RDDs of tuples) of
> data. Some of the types are complex-ish (e.g., date time objects). I'd like
> to use something like case classes for each entry.
>
> What is the best way to store the data to disk in a text format without
> writing custom parsers?  E.g., serializing case classes to/from JSON.
>
> What are other users doing?
>
> Thanks!
> RJ
>
>
> --
> em rnowling@gmail.com
> c 954.496.2314
>

Mime
View raw message