Try writing this Spark Streaming idiom in Java and you'll choose Scala soon enough:

dstream.foreachRDD{rdd =>
     rdd.foreachPartition( partition => ....)
}

When deciding between Java and Scala for Spark, IMHO Scala has the upperhand. If you're concerned with readability, have a look at the Scala coding style recently open sourced by DataBricks: https://github.com/databricks/scala-style-guide  (btw, I don't agree a good part of it, but recognize that it can keep the most complex Scala constructions out of your code)



On Thu, Mar 19, 2015 at 3:50 PM, James King <jakwebinbox@gmail.com> wrote:
Hello All,

I'm using Spark for streaming but I'm unclear one which implementation language to use Java, Scala or Python.

I don't know anything about Python, familiar with Scala and have been doing Java for a long time.

I think the above shouldn't influence my decision on which language to use because I believe the tool should, fit the problem.

In terms of performance Java and Scala are comparable. However Java is OO and Scala is FP, no idea what Python is.

If using Scala and not applying a consistent style of programming Scala code can become unreadable, but I do like the fact it seems to be possible to do so much work with so much less code, that's a strong selling point for me. Also it could be that the type of programming done in Spark is best implemented in Scala as FP language, not sure though.

The question I would like your good help with is are there any other considerations I need to think about when deciding this? are there any recommendations you can make in regards to this?

Regards
jk