spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rex X <>
Subject What is the best way to JOIN two 10TB csv files and three 100kb files on Spark?
Date Fri, 05 Feb 2016 17:25:20 GMT
Dear all,

The new DataFrame of spark is extremely fast. But out cluster have limited
RAM (~500GB).

What is the best way to do such a big table Join?

Any sample code is greatly welcome!


View raw message