spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mosharaf Chowdhury <mosharafka...@gmail.com>
Subject Re: Which strategy is used for broadcast variables?
Date Thu, 12 Mar 2015 02:13:29 GMT
The current broadcast algorithm in Spark approximates the one described in
the Section 5 of this paper
<http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf>.
It is expected to scale sub-linearly; i.e., O(log N), where N is the number
of machines in your cluster.
We evaluated up to 100 machines, and it does follow O(log N) scaling.

--
Mosharaf Chowdhury
http://www.mosharaf.com/

On Wed, Mar 11, 2015 at 3:11 PM, Tom Hubregtsen <thubregtsen@gmail.com>
wrote:

> Thanks Mosharaf, for the quick response! Can you maybe give me some
> pointers to an explanation of this strategy? Or elaborate a bit more on it?
> Which parts are involved in which way? Where are the time penalties and how
> scalable is this implementation?
>
> Thanks again,
>
> Tom
>
> On 11 March 2015 at 16:01, Mosharaf Chowdhury <mosharafkabir@gmail.com>
> wrote:
>
>> Hi Tom,
>>
>> That's an outdated document from 4/5 years ago.
>>
>> Spark currently uses a BitTorrent like mechanism that's been tuned for
>> datacenter environments.
>>
>> Mosharaf
>> ------------------------------
>> From: Tom <thubregtsen@gmail.com>
>> Sent: ‎3/‎11/‎2015 4:58 PM
>> To: user@spark.apache.org
>> Subject: Which strategy is used for broadcast variables?
>>
>> In "Performance and Scalability of Broadcast in Spark" by Mosharaf
>> Chowdhury
>> I read that Spark uses HDFS for its broadcast variables. This seems highly
>> inefficient. In the same paper alternatives are proposed, among which
>> "Bittorent Broadcast (BTB)". While studying "Learning Spark," page 105,
>> second paragraph about Broadcast Variables, I read " The value is sent to
>> each node only once, using an efficient, BitTorrent-like communication
>> mechanism."
>>
>> - Is the book talking about the proposed BTB from the paper?
>>
>> - Is this currently the default?
>>
>> - If not, what is?
>>
>> Thanks,
>>
>> Tom
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Which-strategy-is-used-for-broadcast-variables-tp22004.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Mime
View raw message