spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <linguin....@gmail.com>
Subject Re: What is the best way to JOIN two 10TB csv files and three 100kb files on Spark?
Date Fri, 05 Feb 2016 17:45:01 GMT
Hi,

How about using broadcast joins?
largeDf.join(broadcast(smallDf), "joinKey")

On Sat, Feb 6, 2016 at 2:25 AM, Rex X <dnsring@gmail.com> wrote:

> Dear all,
>
> The new DataFrame of spark is extremely fast. But out cluster have limited
> RAM (~500GB).
>
> What is the best way to do such a big table Join?
>
> Any sample code is greatly welcome!
>
>
> Best,
> Rex
>
>


-- 
---
Takeshi Yamamuro

Mime
View raw message