spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prashant Sharma <scrapco...@gmail.com>
Subject Re: when to use broadcast variables
Date Fri, 02 May 2014 14:50:26 GMT
I had like to be corrected on this but I am just trying to say small enough
of the order of few 100 MBs. Imagine the size gets shipped to all nodes, it
can be a GB but not GBs and then depends on the network too.

Prashant Sharma


On Fri, May 2, 2014 at 6:42 PM, Diana Carroll <dcarroll@cloudera.com> wrote:

> Anyone have any guidance on using a broadcast variable to ship data to
> workers vs. an RDD?
>
> Like, say I'm joining web logs in an RDD with user account data.  I could
> keep the account data in an RDD or if it's "small", a broadcast variable
> instead.  How small is small?  Small enough that I know it can easily fit
> in memory on a single node?  Some other guideline?
>
> Thanks!
>
> Diana
>

Mime
View raw message