spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nagaraj Chandrashekar <nchandrashe...@innominds.com>
Subject Re: Support of other languages?
Date Wed, 09 Sep 2015 01:40:35 GMT
Hi Rahul, 

I may not have the answer for what you are looking for but my thoughts are
given below. 

I have worked with HP Vertica and R VIA UDF¹s (User Defined Functions).  I
don¹t have any experience with Spark R till now. I would expect it might
follow the similar route.

UDF functions containing external shared libraries which is responsible
for some analytics procedure.  These procedure would run as part of the
Vertica process context to process the data stored in its data structures.
 Badly written UDF¹s code can slow down the entire process.

You can refer following URL for further reading on HP Vertica and R
integration. 

https://my.vertica.com/docs/5.0/HTML/Master/15713.htm
https://www.vertica.com/tag/vertica-2/page/8/ (See A Deeper Dive on
Vertica and R section)

Cheers
Nagaraj C

Learn And Share! It¹s Big Data.







On 9/8/15, 8:24 AM, "Rahul Palamuttam" <rahulpalamut@gmail.com> wrote:

>Hi, 
>I wanted to know more about how Spark supports R and Python, with respect
>to
>what gets copied into the language environments.
>
>To clarify :
>
>I know that PySpark utilizes py4j sockets to pass pickled python functions
>between the JVM and the python daemons. However, I wanted to know how it
>passes the data from the JVM into the daemon environment. I assume it has
>to
>copy the data over into the new environment, since python can't exactly
>operate in JVM heap space, (or can it?).
>
>I had the same question with respect to SparkR, though I'm not completely
>familiar with how they pass around native R code through the worker
>JVM's. 
>
>The primary question I wanted to ask is does Spark make a second copy of
>data, so language-specific daemons can operate on the data? What are some
>of
>the other limitations encountered when we try to offer multi-language
>support, whether it's in performance or in general software architecture.
>With python in particular the collect operation must be first written to
>disk and then read back from the python driver process.
>
>Would appreciate any insight on this, and if there is any work happening
>in
>this area.
>
>Thank you,
>
>Rahul Palamuttam  
>
>
>
>--
>View this message in context:
>http://apache-spark-user-list.1001560.n3.nabble.com/Support-of-other-langu
>ages-tp24599.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>For additional commands, e-mail: user-help@spark.apache.org
>


Mime
View raw message