spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Effi Ofer <>
Subject Why is shuffle data always persisted to disk?
Date Wed, 29 Mar 2017 10:19:43 GMT
Greetings, I was wondering why Spark's Shuffler always persists the shuffle
data to disk?  I understand that the persisted data can be used by the
scheduler to truncate the lineage of the RDD graph if an existing RDD has
been materialized as a side effect of an earlier shuffle.  But that does
not explain why Spark is not keeping the shuffle RDD in memory until memory
becomes sufficiently low to trigger victim selection and spilling.  Any
hints and pointers would be appreciated.


View raw message