Just run by yourself this test and check the results. During the run also check with top a worker.


import random

def inside(p):
    x, y = random.random(), random.random()
    return x * x + y * y < 1

def estimate_pi(num_samples):
    count = sc.parallelize(xrange(0, num_samples)).filter(inside).count()
    pi = 4.0 * count / num_samples
    return pi



def estimatePi(numSamples:Int) : Double = {
  val count = sc.parallelize(1 to numSamples).filter { _ =>
    val x = math.random
    val y = math.random
    x * x + y * y < 1
  return (4.0 * count).toFloat / numSamples



On 06-09-2017 06:35, ayan guha wrote:
And I have just the opposite experience ie I know Python but I see scala demands more :)

I think there are few fair points on both sides, and scala wins:

1. Feature parity: Definitely scala wins. Not only new spark features, but if you intend to use 3rd party connectors (such as Azure services). 

2. performance: no serialisation overhead as scala objects are in JVM itself. But it is relevant ONLY for UDF/Custom Python functions.

3. Code complexity: Python is much faster to code, but this is more of choice....

4. data science - here python is first class citizen, almost no feature gap between scala and python api

IMHO, both has sweet spots.......and i would highly recommend to learn python for just sake of sheer fun to code with it :)


On Wed, Sep 6, 2017 at 1:46 PM, Adaryl Wakefield <adaryl.wakefield@hotmail.com> wrote:

Is there any performance difference in writing your application in python vs. scala? I’ve resisted learning Python because it’s an interpreted scripting language, but the market seems to be demanding Python skills.


Adaryl "Bob" Wakefield, MBA
Mass Street Analytics, LLC




Best Regards,
Ayan Guha