spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Chammas <nicholas.cham...@gmail.com>
Subject Re: Scala Vs Python
Date Fri, 02 Sep 2016 14:35:25 GMT
On Fri, Sep 2, 2016 at 3:58 AM Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> I believe as we progress in time Spark is going to move away from Python. If
> you look at 2014 Databricks code examples, they were mostly in Python. Now
> they are mostly in Scala for a reason.
>

That's complete nonsense.

First off, you can find dozens and dozens of Python code examples here:
https://github.com/apache/spark/tree/master/examples/src/main/python

The Python API was added to Spark in 0.7.0
<http://spark.apache.org/news/spark-0-7-0-released.html>, back in February
of 2013, before Spark was even accepted into the Apache incubator. Since
then it's undergone major and continuous development. Though it does lag
behind the Scala API in some areas, it's a first-class language and
bringing it up to parity with Scala is an explicit project goal. A quick
example off the top of my head is all the work that's going into model
import/export for Python: SPARK-11939
<https://issues.apache.org/jira/browse/SPARK-11939>

Additionally, according to the 2015 Spark Survey
<http://cdn2.hubspot.net/hubfs/438089/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf?t=1472746902480>,
58% of Spark users use the Python API, more than any other language save
for Scala (71%). (Users can select multiple languages on the survey.)
Python users were also the 3rd-fastest growing "demographic" for Spark,
after Windows and Spark Streaming users.

Any notion that Spark is going to "move away from Python" is completely
contradicted by the facts.

Nick

Mime
View raw message