spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saravanan Subramanian <>
Subject Re: Maximum Size of Reference Look Up Table in Spark
Date Fri, 15 Jul 2016 15:28:45 GMT
Hello Jacek,
Have you seen any practical limitation or performance degradation issues while using more
than 10GB of broadcast cache ?
Thanks,Saravanan S. 

    On Thursday, 14 July 2016 8:06 PM, Jacek Laskowski <> wrote:


My understanding is that the maximum size of a broadcast is the
Long.MAX_VALUE (and plus some more since the data is going to be
encoded to save space, esp. for catalyst-driver datasets).

Ad 2. Before the tasks access the broadcast variable it has to be sent
across network that may be too slow to be acceptable.

Jacek Laskowski
Mastering Apache Spark
Follow me at

On Thu, Jul 14, 2016 at 11:32 PM, Saravanan Subramanian
<> wrote:
> Hello All,
> I am in the middle of designing real time data enhancement services using
> spark streaming.  As part of this, I have to look up some reference data
> while processing the incoming stream.
> I have below questions:
> 1) what is the maximum size of look up table / variable can be stored as
> Broadcast variable ()
> 2) What is the impact of cluster performance, if I store a 10GB data in
> broadcast variable
> Any suggestions and thoughts are welcome.
> Thanks,
> Saravanan S.

To unsubscribe e-mail:

View raw message