spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rok <rokros...@gmail.com>
Subject minimizing disk I/O
Date Thu, 13 Nov 2014 13:56:00 GMT
I'm trying to understand the disk I/O patterns for Spark -- specifically, I'd
like to reduce the number of files that are being written during shuffle
operations. A couple questions: 

* is the amount of file I/O performed independent of the memory I allocate
for the shuffles? 

* if this is the case, what is the purpose of this memory and is there any
way to see how much of it is actually being used?
 
* how can I minimize the number of files being written? With 24 cores per
node, the filesystem can't handle the large amount of simultaneous I/O very
well so it limits the number of cores I can use... 

Thanks for any insight you might have! 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/minimizing-disk-I-O-tp18845.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message