spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 刘虓 <ipf...@gmail.com>
Subject Re: Scala Vs Python
Date Tue, 06 Sep 2016 10:48:25 GMT
Hi,
I have been using spark-sql with python for more than one year from ver
1.5.0 to ver 2.0.0,
It works great so far,the performance is always great,though I have not
done the benchmark yet.
also I have skimmed through source code of python api,most of it only calls
scala api,nothing heavily is done using python.


2016-09-06 18:38 GMT+08:00 Leonard Cohen <3498363853@qq.com>:

> hi spark user,
>
> IMHO, I will use the language for application aligning with the language
> under which the system designed.
>
> If working on Spark, I choose Scala.
> If working on Hadoop, I choose Java.
> If working on nothing, I use Python.
> Why?
> Because it will save my life, just kidding.
>
>
> Best regards,
> Leonard
> ------------------ Original ------------------
> *From: * "Luciano Resende";<luckbr1975@gmail.com>;
> *Send time:* Tuesday, Sep 6, 2016 8:07 AM
> *To:* "darren"<darren@ontrenet.com>;
> *Cc:* "Mich Talebzadeh"<mich.talebzadeh@gmail.com>; "Jakob Odersky"<
> jakob@odersky.com>; "ayan guha"<guha.ayan@gmail.com>; "kant kodali"<
> kanth909@gmail.com>; "AssafMendelson"<assaf.mendelson@rsa.com>; "user"<
> user@spark.apache.org>;
> *Subject: * Re: Scala Vs Python
>
>
>
> On Thu, Sep 1, 2016 at 3:15 PM, darren <darren@ontrenet.com> wrote:
>
>> This topic is a concern for us as well. In the data science world no one
>> uses native scala or java by choice. It's R and Python. And python is
>> growing. Yet in spark, python is 3rd in line for feature support, if at all.
>>
>> This is why we have decoupled from spark in our project. It's really
>> unfortunate spark team have invested so heavily in scale.
>>
>> As for speed it comes from horizontal scaling and throughout. When you
>> can scale outward, individual VM performance is less an issue. Basic HPC
>> principles.
>>
>>
> You could still try to get best of the both worlds, having your data
> scientists writing their algorithms using Python and/or R and have a
> compiler/optimizer handling the optimizations to run in a distributed
> fashion in a spark cluster leveraging some of the low level apis written in
> java/scala. Take a look at Apache SystemML http://systemml.apache.org/
> for more details.
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>

Mime
View raw message