spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Hubregtsen <thubregt...@gmail.com>
Subject Re: Which strategy is used for broadcast variables?
Date Wed, 11 Mar 2015 22:11:19 GMT
Thanks Mosharaf, for the quick response! Can you maybe give me some
pointers to an explanation of this strategy? Or elaborate a bit more on it?
Which parts are involved in which way? Where are the time penalties and how
scalable is this implementation?

Thanks again,

Tom

On 11 March 2015 at 16:01, Mosharaf Chowdhury <mosharafkabir@gmail.com>
wrote:

> Hi Tom,
>
> That's an outdated document from 4/5 years ago.
>
> Spark currently uses a BitTorrent like mechanism that's been tuned for
> datacenter environments.
>
> Mosharaf
> ------------------------------
> From: Tom <thubregtsen@gmail.com>
> Sent: ‎3/‎11/‎2015 4:58 PM
> To: user@spark.apache.org
> Subject: Which strategy is used for broadcast variables?
>
> In "Performance and Scalability of Broadcast in Spark" by Mosharaf
> Chowdhury
> I read that Spark uses HDFS for its broadcast variables. This seems highly
> inefficient. In the same paper alternatives are proposed, among which
> "Bittorent Broadcast (BTB)". While studying "Learning Spark," page 105,
> second paragraph about Broadcast Variables, I read " The value is sent to
> each node only once, using an efficient, BitTorrent-like communication
> mechanism."
>
> - Is the book talking about the proposed BTB from the paper?
>
> - Is this currently the default?
>
> - If not, what is?
>
> Thanks,
>
> Tom
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Which-strategy-is-used-for-broadcast-variables-tp22004.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message