spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Leonard Cohen" <3498363...@qq.com>
Subject Re: Scala Vs Python
Date Tue, 06 Sep 2016 10:38:38 GMT
hi spark user,


IMHO, I will use the language for application aligning with the language under which the system
designed.


If working on Spark, I choose Scala.
If working on Hadoop, I choose Java.
If working on nothing, I use Python.
Why?
Because it will save my life, just kidding.




Best regards,
Leonard
------------------ Original ------------------
From:  "Luciano Resende";<luckbr1975@gmail.com>;
Send time: Tuesday, Sep 6, 2016 8:07 AM
To: "darren"<darren@ontrenet.com>; 
Cc: "Mich Talebzadeh"<mich.talebzadeh@gmail.com>; "Jakob Odersky"<jakob@odersky.com>;
"ayan guha"<guha.ayan@gmail.com>; "kant kodali"<kanth909@gmail.com>; "AssafMendelson"<assaf.mendelson@rsa.com>;
"user"<user@spark.apache.org>; 
Subject:  Re: Scala Vs Python





On Thu, Sep 1, 2016 at 3:15 PM, darren <darren@ontrenet.com> wrote:
This topic is a concern for us as well. In the data science world no one uses native scala
or java by choice. It's R and Python. And python is growing. Yet in spark, python is 3rd in
line for feature support, if at all.


This is why we have decoupled from spark in our project. It's really unfortunate spark team
have invested so heavily in scale. 


As for speed it comes from horizontal scaling and throughout. When you can scale outward,
individual VM performance is less an issue. Basic HPC principles.





You could still try to get best of the both worlds, having your data scientists writing their
algorithms using Python and/or R and have a compiler/optimizer handling the optimizations
to run in a distributed fashion in a spark cluster leveraging some of the low level apis written
in java/scala. Take a look at Apache SystemML http://systemml.apache.org/ for more details.




-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/
Mime
View raw message