Just run by yourself this test and check the results. During the run also check with top a worker.

Python:

import random

def inside(p):
    x, y = random.random(), random.random()
    return x * x + y * y < 1

def estimate_pi(num_samples):
    count = sc.parallelize(xrange(0, num_samples)).filter(inside).count()
    pi = 4.0 * count / num_samples
    return pi

estimate_pi(1000000000)

Scala:

def estimatePi(numSamples:Int) : Double = {
  val count = sc.parallelize(1 to numSamples).filter { _ =>
    val x = math.random
    val y = math.random
    x * x + y * y < 1
  }.count()
  return (4.0 * count).toFloat / numSamples
}

estimatePi(1000000000)

Regards

On 06-09-2017 06:35, ayan guha wrote:
And I have just the opposite experience ie I know Python but I see scala demands more :)

I think there are few fair points on both sides, and scala wins:

1. Feature parity: Definitely scala wins. Not only new spark features, but if you intend to use 3rd party connectors (such as Azure services). 

2. performance: no serialisation overhead as scala objects are in JVM itself. But it is relevant ONLY for UDF/Custom Python functions.

3. Code complexity: Python is much faster to code, but this is more of choice....

4. data science - here python is first class citizen, almost no feature gap between scala and python api

IMHO, both has sweet spots.......and i would highly recommend to learn python for just sake of sheer fun to code with it :)

best
Ayan

On Wed, Sep 6, 2017 at 1:46 PM, Adaryl Wakefield <adaryl.wakefield@hotmail.com> wrote:

Is there any performance difference in writing your application in python vs. scala? I’ve resisted learning Python because it’s an interpreted scripting language, but the market seems to be demanding Python skills.

 

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685

www.massstreet.net

www.linkedin.com/in/bobwakefieldmba
Twitter:
@BobLovesData

 




--
Best Regards,
Ayan Guha