spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gylfi <gy...@berkeley.edu>
Subject Re: job hangs when using pipe() with reduceByKey()
Date Sun, 01 Nov 2015 08:33:05 GMT
Hi. 

What is slow exactly? 
In code-base 1: 
When you run the persist() + count() you stored the result in RAM. 
Then the map + reducebykey is done on in-memory data. 

In the latter case (all-in-oneline) you are doing both steps at the same
time.

So you are saying that if you sum-up the time to do both steps in the first
code-base it is still much faster than the latter code-base ? 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/job-hangs-when-using-pipe-with-reduceByKey-tp25242p25248.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message