spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject understanding spark shuffle file re-use better
Date Wed, 13 Jan 2021 16:38:37 GMT
is shuffle file re-use based on identity or equality of the dataframe?

for example if run the exact same code twice to load data and do transforms
(joins, aggregations, etc.) but without re-using any actual dataframes,
will i still see skipped stages thanks to shuffle file re-use?

thanks!
koert

Mime
View raw message