spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <>
Subject Re: Any way to see the size of the broadcast variable?
Date Tue, 09 Oct 2018 16:12:49 GMT
Hi Venkat,

do you executors have that much amount of memory?

Gourav Sengupta

On Tue, Oct 9, 2018 at 4:44 PM V0lleyBallJunki3 <>

> Hello,
>    I have set the value of spark.sql.autoBroadcastJoinThreshold to a very
> high value of 20 GB. I am joining a table that I am sure is below this
> variable, however spark is doing a SortMergeJoin. If I set a broadcast hint
> then spark does a broadcast join and job finishes much faster. However,
> when
> run in production for some large tables, I run into errors. Is there a way
> to see the actual size of the table being broadcast? I wrote the table
> being
> broadcast to disk and it took only 32 MB in parquet. I tried to cache this
> table in Zeppelin and run a table.count() operation but nothing gets shown
> on on the Storage tab of the Spark History Server. spark.util.SizeEstimator
> doesn't seem to be giving accurate numbers for this table either. Any way
> to
> figure out the size of this table being broadcast?
> --
> Sent from:
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

View raw message