spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soumya Simanta <soumya.sima...@gmail.com>
Subject Fwd: Is there a way to load a large file from HDFS faster into Spark
Date Thu, 08 May 2014 18:20:33 GMT
I've a Spark cluster with 3 worker nodes.


   - *Workers:* 3
   - *Cores:* 48 Total, 48 Used
   - *Memory:* 469.8 GB Total, 72.0 GB Used

I want a process a single file compressed (*.gz) on HDFS. The file is 1.5GB
compressed and 11GB uncompressed.
When I try to read the compressed file from HDFS it takes a while (4-5
minutes) load it into an RDD. If I use the .cache operation it takes even
longer. Is there a way to make loading of the RDD from HDFS faster ?

Thanks
-Soumya

Mime
View raw message