I was going through the Spark strategies class and found that by default Sort merge join is preferred over shuffled hash join. The preferSortMergeJoin needs to be explicitly set to False if we have to force a shuffled hash join.
1) why is Sort merge join preferred over hash join?
2) are there any performance implications we need to take care of when we force shuffled hash joins?
Sent from my iphone