spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <>
Subject Re: Conflicting PySpark Storage Level Defaults?
Date Mon, 16 Sep 2019 07:02:06 GMT
I don’t know your full source code but you may missing an action so that it is indeed persisted.

> Am 16.09.2019 um 02:07 schrieb grp <>:
> Hi There Spark Users,
> Curious what is going on here.  Not sure if possible bug or missing something.  Extra
eyes are much appreciated.
> Spark UI (Python API 2.4.3) by default is reporting persisted data-frames to be de-serialized
MEMORY_AND_DISK however I always thought they were serialized for Python by default according
to official documentation.
> However when explicitly changing the storage level to default … ex => df.persist(StorageLevel.MEMORY_AND_DISK)
… the Spark UI returns the expected serialized data-frame under Storage Tab, but not when
just calling … df.cache().
> Do we have to explicitly set to … StorageLevel.MEMORY_AND_DISK … to get the serialized
benefit in Python (which I thought was automatic)?  Or is the Spark UI incorrect?
> SO post with specific example/details =>
> Thank you for your time and research!
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

To unsubscribe e-mail:

View raw message