spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davies Liu <>
Subject Re: A Comparison of Platforms for Implementing and Running Very Large Scale Machine Learning Algorithms
Date Wed, 13 Aug 2014 21:31:49 GMT
On Wed, Aug 13, 2014 at 2:16 PM, Ignacio Zendejas
<> wrote:
> Yep, I thought it was a bogus comparison.
> I should rephrase my question as it was poorly phrased: on average, how
> much faster is Spark v. PySpark (I didn't really mean Scala v. Python)?
> I've only used Spark and don't have a chance to test this at the moment so
> if anybody has these numbers or general estimates (10x, etc), that'd be
> great.

A quick comparison by word count on 4.3G text file (local mode),

Spark:  40 seconds
PySpark: 2 minutes and 16 seconds

So PySpark is 3.4x slower than Spark.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message