spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean R. Owen (Jira)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-29035) unpersist() ignoring cache/persist()
Date Sat, 26 Oct 2019 22:54:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-29035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean R. Owen resolved SPARK-29035.
----------------------------------
    Resolution: Not A Problem

Your cache isn't doing anything, because you undo it before anything is evaluated. Nothing
is ignored here, you just never caused it to cache anything before you told it not to cache
df.

> unpersist() ignoring cache/persist()
> ------------------------------------
>
>                 Key: SPARK-29035
>                 URL: https://issues.apache.org/jira/browse/SPARK-29035
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.3
>         Environment: Amazon EMR - Spark 2.4.3
>            Reporter: Jose Silva
>            Priority: Major
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Calling {{unpersist()}}, even though the {{DataFrame}} is not used anymore removes all
the InMemoryTableScan from the DAG.
> Here's a simplified version of the code i'm using:
> {code}
> df = spark.read(...).where(...).cache()
> df_a = union(df.select(...), df.select(...), df.select(...))
> df_b = df.select(...)
> df_c = df.select(...)
> df_d = df.select(...)
> df.unpersist()
> join(df_a, df_b, df_c, df_d).write()
> {code}
> I've created an [album |https://imgur.com/a/c1xGq0r]with the two DAGs, with and without
the {{unpersist()}} call.
> I call unpersist in order to prevent OOM during the join. From what I understand even
though all the DataFrames come from df, unpersisting df after doing the selects shouldn't
ignore the cache call, right?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message