spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: Saving data frames on Spark Master/Driver
Date Fri, 15 Jul 2016 00:01:54 GMT
Hi,

Please re-consider your wish since it is going to move all the
distributed dataset to the single machine of the driver and may lead
to OOME. It's more pro to save your result to HDFS or S3 or any other
distributed filesystem (that is accessible by the driver and
executors).

If you insist...

Use collect() after select() and work with Array[T].

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Jul 15, 2016 at 12:15 AM, vr.n. nachiappan
<nachiappan_vrn@yahoo.com.invalid> wrote:
> Hello,
>
> I am using data frames to join two cassandra tables.
>
> Currently when i invoke save on data frames as shown below it is saving the
> join results on executor nodes.
>
> joineddataframe.select(<col1>, <col2>
> ...).format("com.databricks.spark.csv").option("header",
> "true").save(<path>)
>
> I would like to persist the results of the join on Spark Master/Driver node.
> Is it possible to save the results on Spark Master/Driver and how to do it.
>
> I appreciate your help.
>
> Nachi
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message