spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sun, Rui" <rui....@intel.com>
Subject RE: SparkR package path
Date Thu, 24 Sep 2015 02:28:42 GMT
SparkR package is not a standalone R package, as it is actually R API of Spark and needs to
co-operate with a matching version of Spark, so exposing it in CRAN does not ease use of R
users as they need to download matching Spark distribution, unless we expose a bundled SparkR
package to CRAN (packageing with Spark), is this desirable? Actually, for normal users who
are not developers, they are not required to download Spark source, build and install SparkR
package. They just need to download a Spark distribution, and then use SparkR.

For using SparkR in Rstudio, there is a documentation at https://github.com/apache/spark/tree/master/R



From: Hossein [mailto:falaki@gmail.com]
Sent: Thursday, September 24, 2015 1:42 AM
To: shivaram@eecs.berkeley.edu
Cc: Sun, Rui; dev@spark.apache.org
Subject: Re: SparkR package path

Yes, I think exposing SparkR in CRAN can significantly expand the reach of both SparkR and
Spark itself to a larger community of data scientists (and statisticians).

I have been getting questions on how to use SparkR in RStudio. Most of these folks have a
Spark Cluster and wish to talk to it from RStudio. While that is a bigger task, for now, first
step could be not requiring them to download Spark source and run a script that is named install-dev.sh.
I filed SPARK-10776 to track this.


--Hossein

On Tue, Sep 22, 2015 at 7:21 PM, Shivaram Venkataraman <shivaram@eecs.berkeley.edu<mailto:shivaram@eecs.berkeley.edu>>
wrote:
As Rui says it would be good to understand the use case we want to
support (supporting CRAN installs could be one for example). I don't
think it should be very hard to do as the RBackend itself doesn't use
the R source files. The RRDD does use it and the value comes from
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RUtils.scala#L29
AFAIK -- So we could introduce a new config flag that can be used for
this new mode.

Thanks
Shivaram

On Mon, Sep 21, 2015 at 8:15 PM, Sun, Rui <rui.sun@intel.com<mailto:rui.sun@intel.com>>
wrote:
> Hossein,
>
>
>
> Any strong reason to download and install SparkR source package separately
> from the Spark distribution?
>
> An R user can simply download the spark distribution, which contains SparkR
> source and binary package, and directly use sparkR. No need to install
> SparkR package at all.
>
>
>
> From: Hossein [mailto:falaki@gmail.com<mailto:falaki@gmail.com>]
> Sent: Tuesday, September 22, 2015 9:19 AM
> To: dev@spark.apache.org<mailto:dev@spark.apache.org>
> Subject: SparkR package path
>
>
>
> Hi dev list,
>
>
>
> SparkR backend assumes SparkR source files are located under
> "SPARK_HOME/R/lib/." This directory is created by running R/install-dev.sh.
> This setting makes sense for Spark developers, but if an R user downloads
> and installs SparkR source package, the source files are going to be in
> placed different locations.
>
>
>
> In the R runtime it is easy to find location of package files using
> path.package("SparkR"). But we need to make some changes to R backend and/or
> spark-submit so that, JVM process learns the location of worker.R and
> daemon.R and shell.R from the R runtime.
>
>
>
> Do you think this change is feasible?
>
>
>
> Thanks,
>
> --Hossein

Mime
View raw message