spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soumya Simanta <>
Subject Fwd: Is there a way to load a large file from HDFS faster into Spark
Date Thu, 08 May 2014 18:20:33 GMT
I've a Spark cluster with 3 worker nodes.

   - *Workers:* 3
   - *Cores:* 48 Total, 48 Used
   - *Memory:* 469.8 GB Total, 72.0 GB Used

I want a process a single file compressed (*.gz) on HDFS. The file is 1.5GB
compressed and 11GB uncompressed.
When I try to read the compressed file from HDFS it takes a while (4-5
minutes) load it into an RDD. If I use the .cache operation it takes even
longer. Is there a way to make loading of the RDD from HDFS faster ?


View raw message