spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xing <ehomec...@gmail.com>
Subject parallelize method v.s. textFile method
Date Thu, 25 Jun 2015 00:59:12 GMT
We have a large file and we used to read chunks and then use parallelize
method (distData = sc.parallelize(chunk)) and then do the map/reduce chunk
by chunk. Recently we read the whole file using textFile method and found
the map/reduce job is much faster. Anybody can help us to understand why? We
have verified that reading file is NOT a bottleneck.



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/parallelize-method-v-s-textFile-method-tp12871.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message