spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From attilapiros <>
Subject Re: Is there any inplict RDD cache operation for query optimizations?
Date Mon, 15 Feb 2021 16:40:35 GMT

There is good reason why the decision about caching is left for the user.
Spark does not know about the future of the DataFrames and RDDs.

Think about how your program is running (you are still running program), so
there is an exact point where the execution is and when Spark reaches an
action it evaluates the Spark job but it does not know about the future
jobs. A cached data would be only useful for that future job which will
reuses it.

On the other hand this information is available for the user as he writes
all the jobs.


Sent from:

To unsubscribe e-mail:

View raw message