spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tianshi Zhu (Jira)" <>
Subject [jira] [Commented] (SPARK-31448) Difference in Storage Levels used in cache() and persist() for pyspark dataframes
Date Wed, 06 May 2020 01:36:00 GMT


Tianshi Zhu commented on SPARK-31448:

I found the following comment in in Spark 2.4.3: 

_".. note:: The following four storage level constants are deprecated in 2.0, since the records_
 _will always be serialized in Python."_


So I would assume the counterpart in Scala is 

val MEMORY_AND_DISK_SER = new StorageLevel(true, true, false, false)



val MEMORY_AND_DISK = new StorageLevel(true, true, false, true) means the data is deserialized.
Does that help?

> Difference in Storage Levels used in cache() and persist() for pyspark dataframes
> ---------------------------------------------------------------------------------
>                 Key: SPARK-31448
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.3
>            Reporter: Abhishek Dixit
>            Priority: Major
> There is a difference in default storage level *MEMORY_AND_DISK* in pyspark and scala.
> *Scala*: StorageLevel(true, true, false, true)
> *Pyspark:* StorageLevel(True, True, False, False)
> *Problem Description:* 
> Calling *df.cache()*  for pyspark dataframe directly invokes Scala method cache() and
Storage Level used is StorageLevel(true, true, false, true).
> But calling *df.persist()* for pyspark dataframe sets the newStorageLevel=StorageLevel(true,
true, false, false) inside pyspark and then invokes Scala function persist(newStorageLevel).
> *Possible Fix:*
> Invoke pyspark function persist inside pyspark function cache instead of calling the
scala function directly.
> I can raise a PR for this fix if someone can confirm that this is a bug and the possible
fix is the correct approach.

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message