spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Conflicting PySpark Storage Level Defaults?
Date Mon, 16 Sep 2019 07:02:06 GMT
I don’t know your full source code but you may missing an action so that it is indeed persisted.

> Am 16.09.2019 um 02:07 schrieb grp <gpeterne@villanova.edu>:
> 
> Hi There Spark Users,
> 
> Curious what is going on here.  Not sure if possible bug or missing something.  Extra
eyes are much appreciated.
> 
> Spark UI (Python API 2.4.3) by default is reporting persisted data-frames to be de-serialized
MEMORY_AND_DISK however I always thought they were serialized for Python by default according
to official documentation.
> However when explicitly changing the storage level to default … ex => df.persist(StorageLevel.MEMORY_AND_DISK)
… the Spark UI returns the expected serialized data-frame under Storage Tab, but not when
just calling … df.cache().
> 
> Do we have to explicitly set to … StorageLevel.MEMORY_AND_DISK … to get the serialized
benefit in Python (which I thought was automatic)?  Or is the Spark UI incorrect?
> 
> SO post with specific example/details => https://stackoverflow.com/questions/56926337/conflicting-pyspark-storage-level-defaults
> 
> Thank you for your time and research!
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message