spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: parallelize method v.s. textFile method
Date Thu, 25 Jun 2015 01:04:07 GMT
If you read the file one by one and then use parallelize, it is read by a
single thread on a single machine.

On Wednesday, June 24, 2015, xing <ehomecity@gmail.com> wrote:

> We have a large file and we used to read chunks and then use parallelize
> method (distData = sc.parallelize(chunk)) and then do the map/reduce chunk
> by chunk. Recently we read the whole file using textFile method and found
> the map/reduce job is much faster. Anybody can help us to understand why?
> We
> have verified that reading file is NOT a bottleneck.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/parallelize-method-v-s-textFile-method-tp12871.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org <javascript:;>
> For additional commands, e-mail: dev-help@spark.apache.org <javascript:;>
>
>

Mime
View raw message