spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Palamuttam <>
Subject Support of other languages?
Date Tue, 08 Sep 2015 02:54:02 GMT
I wanted to know more about how Spark supports R and Python, with respect to
what gets copied into the language environments.

To clarify :

I know that PySpark utilizes py4j sockets to pass pickled python functions
between the JVM and the python daemons. However, I wanted to know how it
passes the data from the JVM into the daemon environment. I assume it has to
copy the data over into the new environment, since python can't exactly
operate in JVM heap space, (or can it?).  

I had the same question with respect to SparkR, though I'm not completely
familiar with how they pass around native R code through the worker JVM's. 

The primary question I wanted to ask is does Spark make a second copy of
data, so language-specific daemons can operate on the data? What are some of
the other limitations encountered when we try to offer multi-language
support, whether it's in performance or in general software architecture.
With python in particular the collect operation must be first written to
disk and then read back from the python driver process.

Would appreciate any insight on this, and if there is any work happening in
this area.

Thank you,

Rahul Palamuttam  

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message