spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Diana Carroll <>
Subject when to use broadcast variables
Date Fri, 02 May 2014 13:12:01 GMT
Anyone have any guidance on using a broadcast variable to ship data to
workers vs. an RDD?

Like, say I'm joining web logs in an RDD with user account data.  I could
keep the account data in an RDD or if it's "small", a broadcast variable
instead.  How small is small?  Small enough that I know it can easily fit
in memory on a single node?  Some other guideline?



View raw message