spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Girardot <>
Subject Re: Spark DF CacheTable method. Will it save data to disk?
Date Thu, 18 Aug 2016 06:30:05 GMT
that's another "pipeline" step to add whereas when using persist is just
relevant during the lifetime of your jobs and not in HDFS but in the local disk
of your executors.

On Wed, Aug 17, 2016 5:56 PM, neil90 wrote:
>From the spark


yes you can use persist on a dataframe instead of cache. All cache is, is a

shorthand for the default persist storage level "MEMORY_ONLY". If you want

to persist the dataframe to disk you should do


IMO If reads are expensive against the DB and your afraid of failure why not

just save the data as a parquet on your cluster in hive and read from there?


View this message in context:

Sent from the Apache Spark User List mailing list archive at


To unsubscribe e-mail:

Olivier Girardot | AssociƩ
+33 6 24 09 17 94
View raw message