spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Tustin <marcin.tus...@bluevoyant.com.INVALID>
Subject Re: Why Spark generates Java code and not Scala?
Date Mon, 11 Nov 2019 15:27:12 GMT
Well TIL.

For those also newly informed:
https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-whole-stage-codegen.html
https://mail-archives.apache.org/mod_mbox/spark-dev/201911.mbox/browser


On Sun, Nov 10, 2019 at 7:57 AM Holden Karau <holden@pigscanfly.ca> wrote:

> *This Message originated outside your organization.*
> ------------------------------
> If you look inside of the generation we generate java code and compile it
> with Janino. For interested folks the conversation moved over to the dev@
> list
>
> On Sat, Nov 9, 2019 at 10:37 AM Marcin Tustin
> <marcin.tustin@bluevoyant.com.invalid> wrote:
>
>> What do you mean by this? Spark is written in a combination of Scala and
>> Java, and then compiled to Java Byte Code, as is typical for both Scala and
>> Java. If there's additional byte code generation happening, it's java byte
>> code, because the platform runs on the JVM.
>>
>> On Sat, Nov 9, 2019 at 12:47 PM Bartosz Konieczny <
>> bartkonieczny@gmail.com> wrote:
>>
>>> *This Message originated outside your organization.*
>>> ------------------------------
>>> Hi there,
>>>
>>
>>> Few days ago I got an intriguing but hard to answer question:
>>> "Why Spark generates Java code and not Scala code?"
>>> (https://github.com/bartosz25/spark-scala-playground/issues/18
>>> <https://github.com/bartosz25/spark-scala-playground/issues/18>
>>> )
>>>
>>> Since I'm not sure about the exact answer, I'd like to ask you to
>>> confirm or not my thinking. I was looking for the reasons in the JIRA and
>>> the research paper "Spark SQL: Relational Data Processing in Spark" (
>>> http://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf
>>> <http://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf>)
>>> but found nothing explaining why Java over Scala. The single task I found
>>> was about why Scala and not Java but concerning data types (
>>> https://issues.apache.org/jira/browse/SPARK-5193
>>> <https://issues.apache.org/jira/browse/SPARK-5193>)
>>> That's why I'm writing here.
>>>
>>> My guesses about choosing Java code are:
>>> - Java runtime compiler libs are more mature and prod-ready than the
>>> Scala's - or at least, they were at the implementation time
>>> - Scala compiler tends to be slower than the Java's
>>> https://stackoverflow.com/questions/3490383/java-compile-speed-vs-scala-compile-speed
>>> <https://stackoverflow.com/questions/3490383/java-compile-speed-vs-scala-compile-speed>
>>> - Scala compiler seems to be more complex, so debugging & maintaining it
>>> would be harder
>>> - it was easier to represent a pure Java OO design than mixed FP/OO in
>>> Scala
>>> ?
>>>
>>> Thank you for your help.
>>>
>>> --
>>> Bartosz Konieczny
>>> data engineer
>>> https://www.waitingforcode.com
>>> <https://www.waitingforcode.com>
>>> https://github.com/bartosz25/
>>> <https://github.com/bartosz25/>
>>> https://twitter.com/waitingforcode
>>> <https://twitter.com/waitingforcode>
>>>
>>> --
> Twitter: https://twitter.com/holdenkarau
> <https://twitter.com/holdenkarau>
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> <https://www.youtube.com/user/holdenkarau>
>

Mime
View raw message