spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hossein <fal...@gmail.com>
Subject Re: SparkR package path
Date Thu, 24 Sep 2015 18:36:51 GMT
Requiring users to download entire Spark distribution to connect to a
remote cluster (which is already running Spark) seems an over kill. Even
for most spark users who download Spark source, it is very unintuitive that
they need to run a script named "install-dev.sh" before they can run SparkR.

--Hossein

On Wed, Sep 23, 2015 at 7:28 PM, Sun, Rui <rui.sun@intel.com> wrote:

> SparkR package is not a standalone R package, as it is actually R API of
> Spark and needs to co-operate with a matching version of Spark, so exposing
> it in CRAN does not ease use of R users as they need to download matching
> Spark distribution, unless we expose a bundled SparkR package to CRAN
> (packageing with Spark), is this desirable? Actually, for normal users who
> are not developers, they are not required to download Spark source, build
> and install SparkR package. They just need to download a Spark
> distribution, and then use SparkR.
>
>
>
> For using SparkR in Rstudio, there is a documentation at
> https://github.com/apache/spark/tree/master/R
>
>
>
>
>
>
>
> *From:* Hossein [mailto:falaki@gmail.com]
> *Sent:* Thursday, September 24, 2015 1:42 AM
> *To:* shivaram@eecs.berkeley.edu
> *Cc:* Sun, Rui; dev@spark.apache.org
> *Subject:* Re: SparkR package path
>
>
>
> Yes, I think exposing SparkR in CRAN can significantly expand the reach of
> both SparkR and Spark itself to a larger community of data scientists (and
> statisticians).
>
>
>
> I have been getting questions on how to use SparkR in RStudio. Most of
> these folks have a Spark Cluster and wish to talk to it from RStudio. While
> that is a bigger task, for now, first step could be not requiring them to
> download Spark source and run a script that is named install-dev.sh. I
> filed SPARK-10776 to track this.
>
>
>
>
> --Hossein
>
>
>
> On Tue, Sep 22, 2015 at 7:21 PM, Shivaram Venkataraman <
> shivaram@eecs.berkeley.edu> wrote:
>
> As Rui says it would be good to understand the use case we want to
> support (supporting CRAN installs could be one for example). I don't
> think it should be very hard to do as the RBackend itself doesn't use
> the R source files. The RRDD does use it and the value comes from
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RUtils.scala#L29
> AFAIK -- So we could introduce a new config flag that can be used for
> this new mode.
>
> Thanks
> Shivaram
>
>
> On Mon, Sep 21, 2015 at 8:15 PM, Sun, Rui <rui.sun@intel.com> wrote:
> > Hossein,
> >
> >
> >
> > Any strong reason to download and install SparkR source package
> separately
> > from the Spark distribution?
> >
> > An R user can simply download the spark distribution, which contains
> SparkR
> > source and binary package, and directly use sparkR. No need to install
> > SparkR package at all.
> >
> >
> >
> > From: Hossein [mailto:falaki@gmail.com]
> > Sent: Tuesday, September 22, 2015 9:19 AM
> > To: dev@spark.apache.org
> > Subject: SparkR package path
> >
> >
> >
> > Hi dev list,
> >
> >
> >
> > SparkR backend assumes SparkR source files are located under
> > "SPARK_HOME/R/lib/." This directory is created by running
> R/install-dev.sh.
> > This setting makes sense for Spark developers, but if an R user downloads
> > and installs SparkR source package, the source files are going to be in
> > placed different locations.
> >
> >
> >
> > In the R runtime it is easy to find location of package files using
> > path.package("SparkR"). But we need to make some changes to R backend
> and/or
> > spark-submit so that, JVM process learns the location of worker.R and
> > daemon.R and shell.R from the R runtime.
> >
> >
> >
> > Do you think this change is feasible?
> >
> >
> >
> > Thanks,
> >
> > --Hossein
>
>
>

Mime
View raw message