spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: Any way to see the size of the broadcast variable?
Date Tue, 09 Oct 2018 16:12:49 GMT
Hi Venkat,

do you executors have that much amount of memory?

Regards,
Gourav Sengupta

On Tue, Oct 9, 2018 at 4:44 PM V0lleyBallJunki3 <venkatdabri@gmail.com>
wrote:

> Hello,
>    I have set the value of spark.sql.autoBroadcastJoinThreshold to a very
> high value of 20 GB. I am joining a table that I am sure is below this
> variable, however spark is doing a SortMergeJoin. If I set a broadcast hint
> then spark does a broadcast join and job finishes much faster. However,
> when
> run in production for some large tables, I run into errors. Is there a way
> to see the actual size of the table being broadcast? I wrote the table
> being
> broadcast to disk and it took only 32 MB in parquet. I tried to cache this
> table in Zeppelin and run a table.count() operation but nothing gets shown
> on on the Storage tab of the Spark History Server. spark.util.SizeEstimator
> doesn't seem to be giving accurate numbers for this table either. Any way
> to
> figure out the size of this table being broadcast?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message