spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Figueroa <davidfigue...@gmail.com>
Subject spark.python.worker.reuse not working as expected
Date Thu, 26 Apr 2018 13:25:41 GMT
 given this code block

def return_pid(_): yield os.getpid()
spark = SparkSession.builder.getOrCreate()
pids = set(spark.sparkContext.range(32).mapPartitions(return_pid).collect())

print(pids)
pids = set(spark.sparkContext.range(32).mapPartitions(return_pid).collect())

print(pids)

I was expecting that the same python process ids will be printed twice.
instead, completely different Python process ids are being printed.

spark.python.worker.reuse is true but default. but this unexpected
behaviors still occurs if spark.python.worker.reuse=true explicitly.

Mime
View raw message