spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Gregg <>
Subject Re: Starting a new Spark codebase, Python or Scala / Java?
Date Mon, 21 Nov 2016 18:58:10 GMT
Spark is written in Scala, so yes it's still the strongest option.  You
also get the Dataset type with Scala (compile time type-safety), and that's
not an available feature with Python.

That said, I think the Python API is a viable candidate if you use Pandas
for Data Science.  There are similarities between the DataFrame and Pandas
APIs, and you can convert a Spark DataFrame to a Pandas DataFrame.

On Mon, Nov 21, 2016 at 1:51 PM, Brandon White <>

> Hello all,
> I will be starting a new Spark codebase and I would like to get opinions
> on using Python over Scala. Historically, the Scala API has always been the
> strongest interface to Spark. Is this still true? Are there still many
> benefits and additional features in the Scala API that are not available in
> the Python API? Are there any performance concerns using the Python API
> that do not exist when using the Scala API? Anything else I should know
> about?
> I appreciate any insight you have on using the Scala API over the Python
> API.
> Brandon

View raw message