spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gylfi <>
Subject Re: job hangs when using pipe() with reduceByKey()
Date Sun, 01 Nov 2015 08:33:05 GMT

What is slow exactly? 
In code-base 1: 
When you run the persist() + count() you stored the result in RAM. 
Then the map + reducebykey is done on in-memory data. 

In the latter case (all-in-oneline) you are doing both steps at the same

So you are saying that if you sum-up the time to do both steps in the first
code-base it is still much faster than the latter code-base ? 

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message