spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From raaggarw <>
Subject How spark decides whether to do BroadcastHashJoin or SortMergeJoin
Date Wed, 20 Jul 2016 08:07:42 GMT

How spark decides/optimizes internally as to when it needs to a
BroadcastHashJoin vs SortMergeJoin? Is there anyway we can guide from
outside or through options which Join to use?
Because in my case when i am trying to do a join, spark makes that join as
BroadCastHashJoin internally and when join is actually being executed it
waits for broadcast to be done (which is big data), resulting in timeout.
I do not want to increase value of timeout i.e. 
"spark.sql.broadcastTimeout". Rather i want this to be done via
SortMergeJoin. How can i enforce that?


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe e-mail:

View raw message