spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <>
Subject Re: java vs scala for Apache Spark - is there a performance difference ?
Date Tue, 30 Oct 2018 07:30:35 GMT
Older versions of Spark had indeed a lower performance on Python and R due to a conversion
need between JVM datatypes and python/r datatypes. This changed in Spark 2.2, I think, with
the integration of Apache Arrow.  However, what you do after the conversion in those languages
can be still slower than, for instance, in Java if you do not use Spark only functions. It
could be also faster (eg you use a python module implemented natively in C and if there is
no translation into c datatypes needed). 
Scala has in certain cases a more elegant syntax than Java (if you do not use Lambda). Sometimes
this elegant syntax can lead to (unintentional) inefficient things for which there is a better
way to express them (eg implicit conversions, use of collection methods etc). However there
are better ways and you just have to spot these issues in the source code and address them,
if needed. 
So a comparison does not make really sense between those languages - it always depends.

> Am 30.10.2018 um 07:00 schrieb akshay naidu <>:
> how about Python. 
> java vs scala vs python vs R
> which is better.
>> On Sat, Oct 27, 2018 at 3:34 AM karan alang <> wrote:
>> Hello 
>> - is there a "performance" difference when using Java or Scala for Apache Spark ?
>> I understand, there are other obvious differences (less code with scala, easier to
focus on logic etc), 
>> but wrt performance - i think there would not be much of a difference since both
of them are JVM based, 
>> pls. let me know if this is not the case.
>> thanks!

View raw message