Why would you create a class and then instantiate it to store data and change the class every time you have to add a new element? In OOPS terminology a class represents an object, and an object has states - does it not?

Purely from a data warehousing perspective - one of the fundamental principles in delivering a DW system is to ensure a Single Version of Truth and that is what a Functional way of thinking naturally supports.

We can say by extension that data analytics algorithms are quite in tune with functional way of thinking and therefore Scala, whereas object oriented way of thinking needs to adapt itself to be functional. Of course we can use OOPS concept for delivering data solutions just like we can implement OOPS concept in C.

Java is good for solving certain things which require OOPS rigor and Scala mostly in problems that can use functional way of problem solving - purely from a data processing perspective.

Those who are using performance timings to compare these two languages should start coding in Machine Level Language and then see the performance gains in terms of Java and MLL and should switch over to MLL. Of course MLL is a bit more verbose than Java just as Java is a bit more verbose than Scala and Python - but who's complaining.

Of course, these are my personal thoughts and I may be completely wrong and will be grateful if someone could illustrate how.


On Wed, Jul 15, 2015 at 10:03 AM, Reinis Vicups <spark@orbit-x.de> wrote:
We have a complex application that runs productively for couple of months and heavily uses spark in scala.

Just to give you some insight on complexity - we do not have such a huge source data (only about 500'000 complex elements), but we have more than a billion transformations and intermediate data elements we do with our machine learning algorithms.
Our current spark/mesos cluster consists of 120 CPUs, 190 GB RAM and plenty of HDD space.

Now regarding your question:

- scala is just a beautiful language itself, it has nothing to do with spark;

- spark api fits very naturally into scala semantics because of the map/reduce transformations are written more or less identicaly for local collections and RDDs;

- as with any religious topic, there is controverse discussion on what language is better and most of the times (I have read quite a lot of blog/forum topics on this) argumentation is based on what religion one belongs to (e.g. Java vs Scala vs Python)

- we have checked supposed performance issues and limitations of scala described here: (http://www.infoq.com/news/2011/11/yammer-scala) by re-factoring to "best practices" described in the article and have observed both performance increase in some places and, at the same time, performance decrease in other places. Thus I would say there is no noticeable performance difference between scala vs java in our use case (of course there are and always will be applications where one or other language performs better);

hope I could help

On 15.07.2015 09:27, 诺铁 wrote:
I think different team got different answer for this question.  my team use scala, and happy with it.

On Wed, Jul 15, 2015 at 1:31 PM, Tristan Blakers <tristan@blackfrog.org> wrote:
We have had excellent results operating on RDDs using Java 8 with Lambdas. It’s slightly more verbose than Scala, but I haven’t found this an issue, and haven’t missed any functionality.

The new DataFrame API makes the Spark platform even more language agnostic.


On 15 July 2015 at 06:40, Vineel Yalamarthy <vineelyalamarthy@gmail.com> wrote:
 Good   question. Like  you , many are in the same boat(coming from Java background). Looking forward to response from the community.


On Tue, Jul 14, 2015 at 2:30 PM, spark user <spark_user@yahoo.com.invalid> wrote:
Hi All 

To Start new project in Spark , which technology is good .Java8 OR  Scala .

I am Java developer , Can i start with Java 8  or I Need to learn Scala .

which one is better technology  for quick start any POC project 


- su 


Thanks and Regards,
Venkata Vineel, Student  ,School of Computing
Mobile : +1-385-2109-788

-Innovation is the ability to convert ideas into invoices