spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Performance benchmarking of Spark Vs other languages
Date Tue, 03 May 2016 05:22:57 GMT
Hallo,

Spark is a general framework for distributed in-memory processing. You can always write a
highly-specified piece of code which is faster than Spark, but then it can do only one thing
and if you need something else you will have to rewrite everything from scratch . This is
why Spark is beneficial.
In this context, your setup does not make sense. You should have at least 5 worker nodes to
make evaluations.
Follow the Spark tuning and recommendation guide.

> On 03 May 2016, at 07:02, Abhijith Chandraprabhu <abhijithc@gmail.com> wrote:
> 
> Hello,
> 
> I am trying to find some performance figures of spark vs various other languages for
ALS based recommender system. I am using 20 million ratings movielens dataset. The test environment
involves one big 30 core machine with 132 GB memory. I am using the scala version of the script
provided here,
> http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html 
> 
> I am not an expert in spark, and I assume that varying the n while invoking spark with
following flags, --master local[n], is supposed to provide ideal scaling. 
> 
> Initial observations didnt favour spark by some small margins, but as I said since I
am not a spark expert, I would comment only after being assured that this is the most optimal
way of running the ALS snippet. 
> 
> Could the experts please help me with the most optimal way to get the best timings out
of sparks ALS example on the mentioned environment. Thanks.
> 
> -- 
> Best regards,
> Abhijith

Mime
View raw message