spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Girardot <o.girar...@lateral-thoughts.com>
Subject Re: Spark DF CacheTable method. Will it save data to disk?
Date Thu, 18 Aug 2016 06:30:05 GMT
that's another "pipeline" step to add whereas when using persist is just
relevant during the lifetime of your jobs and not in HDFS but in the local disk
of your executors.





On Wed, Aug 17, 2016 5:56 PM, neil90 neilp1990@icloud.com wrote:
>From the spark


documentation(http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence)

yes you can use persist on a dataframe instead of cache. All cache is, is a

shorthand for the default persist storage level "MEMORY_ONLY". If you want

to persist the dataframe to disk you should do

dataframe.persist(StorageLevel.DISK_ONLY).




IMO If reads are expensive against the DB and your afraid of failure why not

just save the data as a parquet on your cluster in hive and read from there?










--

View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-DF-CacheTable-method-Will-it-save-data-to-disk-tp27533p27551.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.




---------------------------------------------------------------------

To unsubscribe e-mail: user-unsubscribe@spark.apache.org









Olivier Girardot | AssociƩ
o.girardot@lateral-thoughts.com
+33 6 24 09 17 94
Mime
View raw message