spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xinh Huynh <xinh.hu...@gmail.com>
Subject Re: Using R code as part of a Spark Application
Date Wed, 29 Jun 2016 16:53:25 GMT
There is some new SparkR functionality coming in Spark 2.0, such as
"dapply". You could use SparkR to load a Parquet file and then run "dapply"
to apply a function to each partition of a DataFrame.

Info about loading Parquet file:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/sparkr.html#from-data-sources

API doc for "dapply":
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/api/R/index.html

Xinh

On Wed, Jun 29, 2016 at 6:54 AM, sujeet jog <sujeet.jog@gmail.com> wrote:

> try Spark pipeRDD's , you can invoke the R script from pipe , push  the
> stuff you want to do on the Rscript stdin,  p
>
>
> On Wed, Jun 29, 2016 at 7:10 PM, Gilad Landau <Gilad.Landau@clicktale.com>
> wrote:
>
>> Hello,
>>
>>
>>
>> I want to use R code as part of spark application (the same way I would
>> do with Scala/Python).  I want to be able to run an R syntax as a map
>> function on a big Spark dataframe loaded from a parquet file.
>>
>> Is this even possible or the only way to use R is as part of RStudio
>> orchestration of our Spark  cluster?
>>
>>
>>
>> Thanks for the help!
>>
>>
>>
>> Gilad
>>
>>
>>
>
>

Mime
View raw message