spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jitesh chandra Mishra <jitesh...@gmail.com>
Subject Re: Broadcasting a parquet file using spark and python
Date Wed, 01 Apr 2015 04:36:55 GMT
Hi Michael,

Thanks for your response. I am running 1.2.1.

Is there any workaround to achieve the same with 1.2.1?

Thanks,
Jitesh

On Wed, Apr 1, 2015 at 12:25 AM, Michael Armbrust <michael@databricks.com>
wrote:

> In Spark 1.3 I would expect this to happen automatically when the parquet
> table is small (< 10mb, configurable with spark.sql.autoBroadcastJoinThreshold).
> If you are running 1.3 and not seeing this, can you show the code you are
> using to create the table?
>
> On Tue, Mar 31, 2015 at 3:25 AM, jitesh129 <jitesh129@gmail.com> wrote:
>
>> How can we implement a BroadcastHashJoin for spark with python?
>>
>> My SparkSQL inner joins are taking a lot of time since it is performing
>> ShuffledHashJoin.
>>
>> Tables on which join is performed are stored as parquet files.
>>
>> Please help.
>>
>> Thanks and regards,
>> Jitesh
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Broadcasting-a-parquet-file-using-spark-and-python-tp22315.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Mime
View raw message