spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toby Douglass <>
Subject Re: initial basic question from new user
Date Thu, 12 Jun 2014 14:41:51 GMT
On Thu, Jun 12, 2014 at 3:03 PM, Christopher Nguyen <> wrote:

> Toby, #saveAsTextFile() and #saveAsObjectFile() are probably what you want
> for your use case.

Yes.  Thankyou.  I'm about to see if they exist for Python.

> As for Parquet support, that's newly arrived in Spark 1.0.0 together with
> SparkSQL so continue to watch this space.


>  Gerard's suggestion to look at JobServer, which you can generalize as
> "building a long-running application which allows multiple clients to
> load/share/persist/save/collaborate-on RDDs" satisfies a larger, more
> complex use case. That is indeed the job of a higher-level application,
> subject to a wide variety of higher-level design choices. A number of us
> have successfully built Spark-based apps around that model.

To my eyes, where I'm new to Spark, it seems like a sledgehammer being used
to crack a nut.  If RDDs persisted across jobs (a seemingly tiny change), I
wouldn't need JobServer (a whole new application).  There's a ton of
functionality in JobServer which as yet I think I have no use for, except
for that one feature, of persisting RDDs across jobs.

View raw message