spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <hol...@pigscanfly.ca>
Subject Re: Scala vs Python for ETL with Spark
Date Sat, 17 Oct 2020 15:45:57 GMT
Scala and Python have their advantages and disadvantages with Spark.  In my
experience with performance is super important you’ll end up needing to do
some of your work in the JVM, but in many situations what matters work is
what your team and company are familiar with and the ecosystem of tooling
for your domain.

Since that can change so much between people and projects I think arguing
about the one true language is likely to be unproductive.

We’re all here because we want Spark and more broadly open source data
tooling to succeed — let’s keep that in mind. There is far too much stress
in the world, and I know I’ve sometimes used word choices I regret
especially this year. Let’s all take the weekend to do something we enjoy
away from Spark :)

On Sat, Oct 17, 2020 at 7:58 AM "Yuri Oleynikov (‫יורי אולייניקוב‬‎)"
<
yurkao@gmail.com> wrote:

> It seems that thread converted to holy war that has nothing to do with
> original question. If it is, it’s super disappointing
>
> Отправлено с iPhone
>
> > 17 окт. 2020 г., в 15:53, Molotch <magnn@kth.se> написал(а):
> >
> > I would say the pros and cons of Python vs Scala is both down to Spark,
> the
> > languages in themselves and what kind of data engineer you will get when
> you
> > try to hire for the different solutions.
> >
> > With Pyspark you get less functionality and increased complexity with the
> > py4j java interop compared to vanilla Spark. Why would you want that?
> Maybe
> > you want the Python ML tools and have a clear use case, then go for it.
> If
> > not, avoid the increased complexity and reduced functionality of Pyspark.
> >
> > Python vs Scala? Idiomatic Python is a lesson in bad programming
> > habits/ideas, there's no other way to put it. Do you really want
> programmers
> > enjoying coding i such a language hacking away at your system?
> >
> > Scala might be far from perfect with the plethora of ways to express
> > yourself. But Python < 3.5 is not fit for anything except simple
> scripting
> > IMO.
> >
> > Doing exploratory data analysis in a Jupiter notebook, Pyspark seems
> like a
> > fine idea. Coding an entire ETL library including state management, the
> > whole kitchen including the sink, Scala everyday of the week.
> >
> >
> >
> > --
> > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Mime
View raw message