spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From raaggarw <raagg...@adobe.com>
Subject How spark decides whether to do BroadcastHashJoin or SortMergeJoin
Date Wed, 20 Jul 2016 08:07:42 GMT
Hi,

How spark decides/optimizes internally as to when it needs to a
BroadcastHashJoin vs SortMergeJoin? Is there anyway we can guide from
outside or through options which Join to use?
Because in my case when i am trying to do a join, spark makes that join as
BroadCastHashJoin internally and when join is actually being executed it
waits for broadcast to be done (which is big data), resulting in timeout.
I do not want to increase value of timeout i.e. 
"spark.sql.broadcastTimeout". Rather i want this to be done via
SortMergeJoin. How can i enforce that?

Thanks
Ravi



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-spark-decides-whether-to-do-BroadcastHashJoin-or-SortMergeJoin-tp27369.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message