spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Wampler <deanwamp...@gmail.com>
Subject Re: Java vs. Scala for Spark
Date Tue, 08 Sep 2015 15:59:38 GMT
It's true that Java 8 lambdas help. If you've read Learning Spark, where
they use Java 7, Python, and Scala for the examples, it really shows how
awful Java without lambdas is for Spark development.

Still, there are several "power tools" in Scala I would sorely miss using
Java 8:

1. The REPL (interpreter): I do most of my work in the REPL, then move the
code to compiled code when I'm ready to turn it into a batch job. Even
better, use Spark Notebook <http://spark-notebook.io/>! (and on GitHub
<https://github.com/andypetrella/spark-notebook>).
2. Tuples: It's just too convenient to use tuples for schemas, return
values from functions, etc., etc., etc.,
3. Pattern matching: This has no analog in Java, so it's hard to appreciate
it until you understand it, but see this example
<https://github.com/deanwampler/spark-workshop/blob/master/src/main/scala/sparkworkshop/InvertedIndex5b.scala>
for a taste of how concise it makes code!
4. Type inference: Spark really shows its utility. It means a lot less code
to write, but you get the hints of what you just wrote!

My $0.02.

dean


Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Tue, Sep 8, 2015 at 10:28 AM, Igor Berman <igor.berman@gmail.com> wrote:

> we are using java7..its much more verbose that java8 or scala examples
> in addition there sometimes libraries that has no java  api, so you need
> to write them by yourself(e.g. graphx)
> on the other hand, scala is not trivial language like java, so it depends
> on your team
>
> On 8 September 2015 at 17:44, Bryan Jeffrey <bryan.jeffrey@gmail.com>
> wrote:
>
>> Thank you for the quick responses.  It's useful to have some insight from
>> folks already extensively using Spark.
>>
>> Regards,
>>
>> Bryan Jeffrey
>>
>> On Tue, Sep 8, 2015 at 10:28 AM, Sean Owen <sowen@cloudera.com> wrote:
>>
>>> Why would Scala vs Java performance be different Ted? Relatively
>>> speaking there is almost no runtime difference; it's the same APIs or
>>> calls via a thin wrapper. Scala/Java vs Python is a different story.
>>>
>>> Java libraries can be used in Scala. Vice-versa too, though calling
>>> Scala-generated classes can be clunky in Java. What's your concern
>>> about interoperability Jeffrey?
>>>
>>> I disagree that Java 7 vs Scala usability is sooo different, but it's
>>> certainly much more natural to use Spark in Scala. Java 8 closes a lot
>>> of the usability gap with Scala, but not all of it. Enough that it's
>>> not crazy for a Java shop to stick to Java 8 + Spark and not be at a
>>> big disadvantage.
>>>
>>> The downsides of Scala IMHO are that it provides too much: lots of
>>> nice features (closures! superb collections!), lots of rope to hang
>>> yourself too (implicits sometimes!) and some WTF features (XML
>>> literals!) Learning the good useful bits of Scala isn't hard. You can
>>> always write Scala code as much like Java as you like, I find.
>>>
>>> Scala tooling is different from Java tooling; that's an
>>> underappreciated barrier. For example I think SBT is good for
>>> development, bad for general project lifecycle management compared to
>>> Maven, but in any event still less developed. SBT/scalac are huge
>>> resource hogs, since so much of Scala is really implemented in the
>>> compiler; prepare to update your laptop to develop in Scala on your
>>> IDE of choice, and start to think about running long-running compile
>>> servers like we did in the year 2000.
>>>
>>> Still net-net I would choose Scala, FWIW.
>>>
>>> On Tue, Sep 8, 2015 at 3:07 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>> > Performance wise, Scala is by far the best choice when you use Spark.
>>> >
>>> > The cost of learning Scala is not negligible but not insurmountable
>>> either.
>>> >
>>> > My personal opinion.
>>> >
>>> > On Tue, Sep 8, 2015 at 6:50 AM, Bryan Jeffrey <bryan.jeffrey@gmail.com
>>> >
>>> > wrote:
>>> >>
>>> >> All,
>>> >>
>>> >> We're looking at language choice in developing a simple streaming
>>> >> processing application in spark.  We've got a small set of example
>>> code
>>> >> built in Scala.  Articles like the following:
>>> >>
>>> http://www.bigdatatidbits.cc/2015/02/navigating-from-scala-to-spark-for.html
>>> >> would seem to indicate that Scala is great for use in distributed
>>> >> programming (including Spark).  However, there is a large group of
>>> folks
>>> >> that seem to feel that interoperability with other Java libraries is
>>> much to
>>> >> be desired, and that the cost of learning (yet another) language is
>>> quite
>>> >> high.
>>> >>
>>> >> Has anyone looked at Scala for Spark dev in an enterprise environment?
>>> >> What was the outcome?
>>> >>
>>> >> Regards,
>>> >>
>>> >> Bryan Jeffrey
>>> >
>>> >
>>>
>>
>>
>

Mime
View raw message