spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom <>
Subject Which strategy is used for broadcast variables?
Date Wed, 11 Mar 2015 20:57:05 GMT
In "Performance and Scalability of Broadcast in Spark" by Mosharaf Chowdhury
I read that Spark uses HDFS for its broadcast variables. This seems highly
inefficient. In the same paper alternatives are proposed, among which
"Bittorent Broadcast (BTB)". While studying "Learning Spark," page 105,
second paragraph about Broadcast Variables, I read " The value is sent to
each node only once, using an efficient, BitTorrent-like communication

- Is the book talking about the proposed BTB from the paper? 

- Is this currently the default? 

- If not, what is?



View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message