spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sidney Feiner <>
Subject [PySpark] - running processes
Date Mon, 03 Jul 2017 11:53:03 GMT
In my Spark Streaming application, I have the need to build a graph from a file and initializing
that graph takes between 5 and 10 seconds.

So I tried initializing it once per executor so it'll be initialized only once.

After running the application, I've noticed that it's initiated much more than once per executor,
every time with a different process id (every process has it's own logger).

Doesn't every executor have it's own JVM and it's own process? Or is that only relevant when
I develop in JVM languages like Scala/Java? Do executors in PySpark spawn new processes for
new tasks?

And if they do, how can I make sure that my graph object will really only be initiated once?
Thanks :)

Sidney Feiner / SW Developer
M: +972.528197720 / Skype: sidney.feiner.startapp


View raw message