spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From grp <>
Subject Re: [EXTERNAL] Re: Conflicting PySpark Storage Level Defaults?
Date Mon, 16 Sep 2019 22:21:39 GMT
Running a simple test - here is the stack overflow code snippet using .count() as the action.
 You can see the differences between the storage levels.


# id 3 => using default storage level for df (memory_and_disk) and unsure why storage level
is not serialized since i am using pyspark
df = spark.range(10)

# id 15 => using default storage level for rdd (memory_only) and makes sense why it is
rdd = df.rdd

# id 19 => manually configuring to (memory_and_disk) which makes the storage level serialized
df2 = spark.range(100)
from pyspark import StorageLevel

<class 'pyspark.sql.dataframe.DataFrame'>
Disk Memory Deserialized 1x Replicated
<class 'pyspark.rdd.RDD'>
<class 'pyspark.sql.dataframe.DataFrame'>
Disk Memory Serialized 1x Replicated

> On Sep 16, 2019, at 2:02 AM, Jörn Franke <> wrote:
> I don’t know your full source code but you may missing an action so that it is indeed
>> Am 16.09.2019 um 02:07 schrieb grp <>:
>> Hi There Spark Users,
>> Curious what is going on here.  Not sure if possible bug or missing something.  Extra
eyes are much appreciated.
>> Spark UI (Python API 2.4.3) by default is reporting persisted data-frames to be de-serialized
MEMORY_AND_DISK however I always thought they were serialized for Python by default according
to official documentation.
>> However when explicitly changing the storage level to default … ex => df.persist(StorageLevel.MEMORY_AND_DISK)
… the Spark UI returns the expected serialized data-frame under Storage Tab, but not when
just calling … df.cache().
>> Do we have to explicitly set to … StorageLevel.MEMORY_AND_DISK … to get the serialized
benefit in Python (which I thought was automatic)?  Or is the Spark UI incorrect?
>> SO post with specific example/details =>
>> Thank you for your time and research!
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail:

View raw message