spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From grp <gpete...@villanova.edu>
Subject Conflicting PySpark Storage Level Defaults?
Date Mon, 16 Sep 2019 00:07:19 GMT
Hi There Spark Users,

Curious what is going on here.  Not sure if possible bug or missing something.  Extra eyes
are much appreciated.

Spark UI (Python API 2.4.3) by default is reporting persisted data-frames to be de-serialized
MEMORY_AND_DISK however I always thought they were serialized for Python by default according
to official documentation.
However when explicitly changing the storage level to default … ex => df.persist(StorageLevel.MEMORY_AND_DISK)
… the Spark UI returns the expected serialized data-frame under Storage Tab, but not when
just calling … df.cache().

Do we have to explicitly set to … StorageLevel.MEMORY_AND_DISK … to get the serialized
benefit in Python (which I thought was automatic)?  Or is the Spark UI incorrect?

SO post with specific example/details => https://stackoverflow.com/questions/56926337/conflicting-pyspark-storage-level-defaults

Thank you for your time and research!
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message