spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From neil90 <neilp1...@icloud.com>
Subject Re: Spark DF CacheTable method. Will it save data to disk?
Date Wed, 17 Aug 2016 15:56:48 GMT
>From the spark
documentation(http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence)
yes you can use persist on a dataframe instead of cache. All cache is, is a
shorthand for the default persist storage level "MEMORY_ONLY". If you want
to persist the dataframe to disk you should do
dataframe.persist(StorageLevel.DISK_ONLY). 

IMO If reads are expensive against the DB and your afraid of failure why not
just save the data as a parquet on your cluster in hive and read from there?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-DF-CacheTable-method-Will-it-save-data-to-disk-tp27533p27551.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message