spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lunagariya, Dhaval " <>
Subject Don't find Skipped Stages in Spark Dataset
Date Mon, 25 Feb 2019 06:46:30 GMT
I am trying to understand spark execution in case of Dataset.

For RDD i found in Spark Docs below -

Shuffle also generates a large number of intermediate files on disk. As of Spark 1.3, these
files are preserved until the corresponding RDDs are no longer used and are garbage collected.
This is done so the shuffle files don't need to be re-created if the lineage is re-computed.

I tried runing similar thing with RDD and Dataset, I don't find skipped stages in case Dataset
execution. Is there any hint i need to do in code for preserving shuffle.I mean i want dataset
should share shuffle files between jobs.
Code sample available here:


View raw message