spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhilanand <>
Subject Join selection
Date Tue, 05 Mar 2019 07:20:04 GMT

I was going through the Spark strategies class and found that by default
Sort merge join is preferred over shuffled hash join. The
preferSortMergeJoin needs to be explicitly set to False if we have to force
a shuffled hash join.

1) why is Sort merge join preferred over hash join?
2) are there any performance implications we need to take care of when we
force shuffled hash joins?

Sent from my iphone

View raw message