spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Java vs. Scala for Spark
Date Wed, 09 Sep 2015 15:00:07 GMT
Java 8 lambdas are broken to the point of near-uselessness (because of
checked exceptions and inability to close over non-final references).  I
wouldn't use them as a deciding factor in language choice.

Any competent developer should be able to write reasonable java-in-scala
after a week and reading a copy of "Scala for the Impatient"

On Tue, Sep 8, 2015 at 11:15 AM, Jerry Lam <chilinglam@gmail.com> wrote:

> Hi Bryan,
>
> I would choose a language based on the requirements. It does not make
> sense if you have a lot of dependencies that are java-based components and
> interoperability between java and scala is not always obvious.
>
> I agree with the above comments that Java is much more verbose than Scala
> in many cases if not all. However, I personally don't find the verbosity is
> a key factor in choosing a language. For the sake of argument, will you be
> discouraged if you need to write 3 lines of Java for 1 line of scala? I
> really don't care the number of lines as long as I can finish the task
> within a period of time.
>
> I believe, correct me if I'm wrong please, all spark functionalities you
> can find in Scala are also available in Java that includes the mllib,
> sparksql, streaming, etc. So you won't miss any features of spark by using
> Java.
>
> It seems the questions should be
> - what language do the developers are comfortable with?
> - what are the components in the system that will constraint the choice of
> the language?
>
> Best Regards,
>
> Jerry
>
> On Tue, Sep 8, 2015 at 11:59 AM, Dean Wampler <deanwampler@gmail.com>
> wrote:
>
>> It's true that Java 8 lambdas help. If you've read Learning Spark, where
>> they use Java 7, Python, and Scala for the examples, it really shows how
>> awful Java without lambdas is for Spark development.
>>
>> Still, there are several "power tools" in Scala I would sorely miss using
>> Java 8:
>>
>> 1. The REPL (interpreter): I do most of my work in the REPL, then move
>> the code to compiled code when I'm ready to turn it into a batch job. Even
>> better, use Spark Notebook <http://spark-notebook.io/>! (and on GitHub
>> <https://github.com/andypetrella/spark-notebook>).
>> 2. Tuples: It's just too convenient to use tuples for schemas, return
>> values from functions, etc., etc., etc.,
>> 3. Pattern matching: This has no analog in Java, so it's hard to
>> appreciate it until you understand it, but see this example
>> <https://github.com/deanwampler/spark-workshop/blob/master/src/main/scala/sparkworkshop/InvertedIndex5b.scala>
>> for a taste of how concise it makes code!
>> 4. Type inference: Spark really shows its utility. It means a lot less
>> code to write, but you get the hints of what you just wrote!
>>
>> My $0.02.
>>
>> dean
>>
>>
>> Dean Wampler, Ph.D.
>> Author: Programming Scala, 2nd Edition
>> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
>> Typesafe <http://typesafe.com>
>> @deanwampler <http://twitter.com/deanwampler>
>> http://polyglotprogramming.com
>>
>> On Tue, Sep 8, 2015 at 10:28 AM, Igor Berman <igor.berman@gmail.com>
>> wrote:
>>
>>> we are using java7..its much more verbose that java8 or scala examples
>>> in addition there sometimes libraries that has no java  api, so you need
>>> to write them by yourself(e.g. graphx)
>>> on the other hand, scala is not trivial language like java, so it
>>> depends on your team
>>>
>>> On 8 September 2015 at 17:44, Bryan Jeffrey <bryan.jeffrey@gmail.com>
>>> wrote:
>>>
>>>> Thank you for the quick responses.  It's useful to have some insight
>>>> from folks already extensively using Spark.
>>>>
>>>> Regards,
>>>>
>>>> Bryan Jeffrey
>>>>
>>>> On Tue, Sep 8, 2015 at 10:28 AM, Sean Owen <sowen@cloudera.com> wrote:
>>>>
>>>>> Why would Scala vs Java performance be different Ted? Relatively
>>>>> speaking there is almost no runtime difference; it's the same APIs or
>>>>> calls via a thin wrapper. Scala/Java vs Python is a different story.
>>>>>
>>>>> Java libraries can be used in Scala. Vice-versa too, though calling
>>>>> Scala-generated classes can be clunky in Java. What's your concern
>>>>> about interoperability Jeffrey?
>>>>>
>>>>> I disagree that Java 7 vs Scala usability is sooo different, but it's
>>>>> certainly much more natural to use Spark in Scala. Java 8 closes a lot
>>>>> of the usability gap with Scala, but not all of it. Enough that it's
>>>>> not crazy for a Java shop to stick to Java 8 + Spark and not be at a
>>>>> big disadvantage.
>>>>>
>>>>> The downsides of Scala IMHO are that it provides too much: lots of
>>>>> nice features (closures! superb collections!), lots of rope to hang
>>>>> yourself too (implicits sometimes!) and some WTF features (XML
>>>>> literals!) Learning the good useful bits of Scala isn't hard. You can
>>>>> always write Scala code as much like Java as you like, I find.
>>>>>
>>>>> Scala tooling is different from Java tooling; that's an
>>>>> underappreciated barrier. For example I think SBT is good for
>>>>> development, bad for general project lifecycle management compared to
>>>>> Maven, but in any event still less developed. SBT/scalac are huge
>>>>> resource hogs, since so much of Scala is really implemented in the
>>>>> compiler; prepare to update your laptop to develop in Scala on your
>>>>> IDE of choice, and start to think about running long-running compile
>>>>> servers like we did in the year 2000.
>>>>>
>>>>> Still net-net I would choose Scala, FWIW.
>>>>>
>>>>> On Tue, Sep 8, 2015 at 3:07 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>>> > Performance wise, Scala is by far the best choice when you use Spark.
>>>>> >
>>>>> > The cost of learning Scala is not negligible but not insurmountable
>>>>> either.
>>>>> >
>>>>> > My personal opinion.
>>>>> >
>>>>> > On Tue, Sep 8, 2015 at 6:50 AM, Bryan Jeffrey <
>>>>> bryan.jeffrey@gmail.com>
>>>>> > wrote:
>>>>> >>
>>>>> >> All,
>>>>> >>
>>>>> >> We're looking at language choice in developing a simple streaming
>>>>> >> processing application in spark.  We've got a small set of example
>>>>> code
>>>>> >> built in Scala.  Articles like the following:
>>>>> >>
>>>>> http://www.bigdatatidbits.cc/2015/02/navigating-from-scala-to-spark-for.html
>>>>> >> would seem to indicate that Scala is great for use in distributed
>>>>> >> programming (including Spark).  However, there is a large group
of
>>>>> folks
>>>>> >> that seem to feel that interoperability with other Java libraries
>>>>> is much to
>>>>> >> be desired, and that the cost of learning (yet another) language
is
>>>>> quite
>>>>> >> high.
>>>>> >>
>>>>> >> Has anyone looked at Scala for Spark dev in an enterprise
>>>>> environment?
>>>>> >> What was the outcome?
>>>>> >>
>>>>> >> Regards,
>>>>> >>
>>>>> >> Bryan Jeffrey
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message