spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Broadcasting a parquet file using spark and python
Date Tue, 31 Mar 2015 18:55:02 GMT
In Spark 1.3 I would expect this to happen automatically when the parquet
table is small (< 10mb, configurable with
spark.sql.autoBroadcastJoinThreshold).
If you are running 1.3 and not seeing this, can you show the code you are
using to create the table?

On Tue, Mar 31, 2015 at 3:25 AM, jitesh129 <jitesh129@gmail.com> wrote:

> How can we implement a BroadcastHashJoin for spark with python?
>
> My SparkSQL inner joins are taking a lot of time since it is performing
> ShuffledHashJoin.
>
> Tables on which join is performed are stored as parquet files.
>
> Please help.
>
> Thanks and regards,
> Jitesh
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Broadcasting-a-parquet-file-using-spark-and-python-tp22315.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message