spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gautham <>
Subject pyspark sc.textFile uses only 4 out of 32 threads per node
Date Tue, 09 Dec 2014 18:59:41 GMT
I am having an issue with pyspark launched in ec2 (using spark-ec2) with 5
r3.4xlarge machines where each has 32 threads and 240GB of RAM. When I do
sc.textFile to load data from a number of gz files, it does not progress as
fast as expected. When I log-in to a child node and run top, I see only 4
threads at 100 cpu. All remaining 28 cores were idle. This is not an issue
when processing the strings after loading, when all the cores are used to
process the data.

Please help me with this? What setting can be changed to get the CPU usage
back up to full?

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message