spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick <titlibat...@gmail.com>
Subject Re: Broadcast Join and Inner Join giving different result on same DataFrame
Date Tue, 03 Jan 2017 14:04:41 GMT
Hi,

An Update on above question: In Local[*] mode code is working fine. The
Broadcast size is 200MB, but on Yarn it the broadcast join is giving empty
result.But in Sql Query in UI, it does show BroadcastHint.

Thanks


On Fri, Dec 30, 2016 at 9:15 PM, titli batali <titlibatali@gmail.com> wrote:

> Hi,
>
> I have two dataframes which has common column Product_Id on which i have
> to perform a join operation.
>
>     val transactionDF = readCSVToDataFrame(sqlCtx: SQLContext,
> pathToReadTransactions: String, transactionSchema: StructType)
>     val productDF = readCSVToDataFrame(sqlCtx: SQLContext,
> pathToReadProduct:String, productSchema: StructType)
>
> As, transaction data is very large but product data is small, i would
> ideally do a  broadcast join where i braodcast productDF.
>
>      val productBroadcastDF =  broadcast(productDF)
>      val broadcastJoin = transcationDF.join(productBroadcastDF,
> "productId")
>
> Or simply,  val innerJoin = transcationDF.join(productDF, "productId")
> should give the same result as above.
>
> But If i join using simple inner join i get  dataframe  with joined values
> whereas if i do broadcast join i get empty dataframe with empty values. I
> am not able to explain this behavior. Ideally both should give the same
> result.
>
> What could have gone wrong. Any one faced the similar issue?
>
>
> Thanks,
> Prateek
>
>
>
>
>

Mime
View raw message