spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corey Nolet <>
Subject Shuffle guarantees
Date Tue, 01 Mar 2016 19:56:19 GMT
So if I'm using reduceByKey() with a HashPartitioner, I understand that the
hashCode() of my key is used to create the underlying shuffle files.

Is anything other than hashCode() used in the shuffle files when the data
is pulled into the reducers and run through the reduce function? The reason
I'm asking is because there's a possibility of hashCode() colliding in two
different objects which end up hashing to the same number, right?

View raw message