spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Uang <justin.u...@gmail.com>
Subject Re: how can I write a language "wrapper"?
Date Tue, 30 Jun 2015 02:04:45 GMT
My guess is that if you are just wrapping the spark sql APIs, you can get
away with not having to reimplement a lot of the complexities in Pyspark
like storing everything in RDDs as pickled byte arrays, pipelining RDDs,
doing aggregations and joins in the python interpreters, etc.

Since the canonical representation of objects in Spark SQL is in scala/jvm,
you're effectively just proxying calls to the java side. The only tricky
thing is UDFs, which naturally need to run in an interpreter of the wrapper
language. I'm currently thinking of redesigning the UDFs to be sent in a
language agnostic data format like protobufs or msgpack, so that all
language wrappers just need to implement the simple protocol of reading
those in, transforming it, then outputting it back as that language
agnostic format.

On Mon, Jun 29, 2015 at 6:39 AM Daniel Darabos <
daniel.darabos@lynxanalytics.com> wrote:

> Hi Vasili,
> It so happens that the entire SparkR code was merged to Apache Spark in a
> single pull request. So you can see at once all the required changes in
> https://github.com/apache/spark/pull/5096. It's 12,043 lines and took
> more than 20 people about a year to write as I understand it.
>
> On Mon, Jun 29, 2015 at 10:33 AM, Vasili I. Galchin <vigalchin@gmail.com>
> wrote:
>
>> Shivaram,
>>
>>     Vis-a-vis Haskell support, I am reading DataFrame.R,
>> SparkRBackend*, context.R, et. al., am I headed in the correct
>> direction?/ Yes or no, please give more guidance. Thank you.
>>
>> Kind regards,
>>
>> Vasili
>>
>>
>>
>> On Tue, Jun 23, 2015 at 1:46 PM, Shivaram Venkataraman
>> <shivaram@eecs.berkeley.edu> wrote:
>> > Every language has its own quirks / features -- so I don't think there
>> > exists a document on how to go about doing this for a new language. The
>> most
>> > related write up I know of is the wiki page on PySpark internals
>> > https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
>> written
>> > by Josh Rosen -- It covers some of the issues like closure capture,
>> > serialization, JVM communication that you'll need to handle for a new
>> > language.
>> >
>> > Thanks
>> > Shivaram
>> >
>> > On Tue, Jun 23, 2015 at 1:35 PM, Vasili I. Galchin <vigalchin@gmail.com
>> >
>> > wrote:
>> >>
>> >> Hello,
>> >>
>> >>       I want to add language support for another language(other than
>> >> Scala, Java et. al.). Where is documentation that explains to provide
>> >> support for a new language?
>> >>
>> >> Thank you,
>> >>
>> >> Vasili
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>
>

Mime
View raw message