spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From oppokui <oppo...@gmail.com>
Subject Re: Support R in Spark
Date Thu, 18 Sep 2014 14:49:10 GMT
Shivaram, 

As I know, SparkR used rJava package. In work node, spark code will execute R code by launching
R process and send/receive byte array. 
I have a question on when to launch R process. R process is per Work process, or per executor
thread, or per each RDD processing?

Thanks and Regards.

Kui  

> On Sep 6, 2014, at 5:53 PM, oppokui <oppokui@gmail.com> wrote:
> 
> Cool! It is a very good news. Can’t wait for it.
> 
> Kui 
> 
>> On Sep 5, 2014, at 1:58 AM, Shivaram Venkataraman <shivaram@eecs.berkeley.edu>
wrote:
>> 
>> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
>> things we are working on. One of the main features is to expose a data
>> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
>> be integrating this with Spark's MLLib.  At a high-level this will
>> allow R users to use a familiar API but make use of MLLib's efficient
>> distributed implementation. This is the same strategy used in Python
>> as well.
>> 
>> Also we do hope to merge SparkR with mainline Spark -- we have a few
>> features to complete before that and plan to shoot for integration by
>> Spark 1.3.
>> 
>> Thanks
>> Shivaram
>> 
>> On Wed, Sep 3, 2014 at 9:24 PM, oppokui <oppokui@gmail.com> wrote:
>>> Thanks, Shivaram.
>>> 
>>> No specific use case yet. We try to use R in our project as data scientest
>>> are all knowing R. We had a concern that how R handles the mass data. Spark
>>> does a better work on big data area, and Spark ML is focusing on predictive
>>> analysis area. Then we are thinking whether we can merge R and Spark
>>> together. We tried SparkR and it is pretty easy to use. But we didn’t see
>>> any feedback on this package in industry. It will be better if Spark team
>>> has R support just like scala/Java/Python.
>>> 
>>> Another question is that MLlib will re-implement all famous data mining
>>> algorithms in Spark, then what is the purpose of using R?
>>> 
>>> There is another technique for us H2O which support R natively. H2O is more
>>> friendly to data scientist. I saw H2O can also work on Spark (Sparkling
>>> Water).  It is better than using SparkR?
>>> 
>>> Thanks and Regards.
>>> 
>>> Kui
>>> 
>>> 
>>> On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
>>> <shivaram@eecs.berkeley.edu> wrote:
>>> 
>>> Hi
>>> 
>>> Do you have a specific use-case where SparkR doesn't work well ? We'd love
>>> to hear more about use-cases and features that can be improved with SparkR.
>>> 
>>> Thanks
>>> Shivaram
>>> 
>>> 
>>> On Wed, Sep 3, 2014 at 3:19 AM, oppokui <oppokui@gmail.com> wrote:
>>>> 
>>>> Does spark ML team have plan to support R script natively? There is a
>>>> SparkR project, but not from spark team. Spark ML used netlib-java to talk
>>>> with native fortran routines or use NumPy, why not try to use R in some
>>>> sense.
>>>> 
>>>> R had lot of useful packages. If spark ML team can include R support, it
>>>> will be a very powerful.
>>>> 
>>>> Any comment?
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>> 
>>> 
>>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message