spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Nguyen <...@adatao.com>
Subject Re: Support R in Spark
Date Sat, 06 Sep 2014 20:50:56 GMT
Hi Kui, sorry about that. That link you mentioned is probably the one for
the products. We don't have one pointing from adatao.com to ddf.io; maybe
we'll add it.

As for access to the code base itself, I think the team has already created
a GitHub repo for it, and should open it up within a few weeks. There's
some debate about whether to put out the implementation with Shark
dependencies now, or SparkSQL with a bit limited functionality and not as
well tested.

I'll check and ping when this is opened up.

The license is Apache.

Sent while mobile. Please excuse typos etc.
On Sep 6, 2014 1:39 PM, "oppokui" <oppokui@gmail.com> wrote:

> Thanks, Christopher. I saw it before, it is amazing. Last time I try to
> download it from adatao, but no response after filling the table. How can I
> download it or its source code? What is the license?
>
> Kui
>
>
> On Sep 6, 2014, at 8:08 PM, Christopher Nguyen <ctn@adatao.com> wrote:
>
> Hi Kui,
>
> DDF (open sourced) also aims to do something similar, adding RDBMS idioms,
> and is already implemented on top of Spark.
>
> One philosophy is that the DDF API aggressively hides the notion of
> parallel datasets, exposing only (mutable) tables to users, on which they
> can apply R and other familiar data mining/machine learning idioms, without
> having to know about the distributed representation underneath. Now, you
> can get to the underlying RDDs if you want to, simply by asking for it.
>
> This was launched at the July Spark Summit. See
> http://spark-summit.org/2014/talk/distributed-dataframe-ddf-on-apache-spark-simplifying-big-data-for-the-rest-of-us
> .
>
> Sent while mobile. Please excuse typos etc.
> On Sep 4, 2014 1:59 PM, "Shivaram Venkataraman" <
> shivaram@eecs.berkeley.edu> wrote:
>
>> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
>> things we are working on. One of the main features is to expose a data
>> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
>> be integrating this with Spark's MLLib.  At a high-level this will
>> allow R users to use a familiar API but make use of MLLib's efficient
>> distributed implementation. This is the same strategy used in Python
>> as well.
>>
>> Also we do hope to merge SparkR with mainline Spark -- we have a few
>> features to complete before that and plan to shoot for integration by
>> Spark 1.3.
>>
>> Thanks
>> Shivaram
>>
>> On Wed, Sep 3, 2014 at 9:24 PM, oppokui <oppokui@gmail.com> wrote:
>> > Thanks, Shivaram.
>> >
>> > No specific use case yet. We try to use R in our project as data
>> scientest
>> > are all knowing R. We had a concern that how R handles the mass data.
>> Spark
>> > does a better work on big data area, and Spark ML is focusing on
>> predictive
>> > analysis area. Then we are thinking whether we can merge R and Spark
>> > together. We tried SparkR and it is pretty easy to use. But we didn’t
>> see
>> > any feedback on this package in industry. It will be better if Spark
>> team
>> > has R support just like scala/Java/Python.
>> >
>> > Another question is that MLlib will re-implement all famous data mining
>> > algorithms in Spark, then what is the purpose of using R?
>> >
>> > There is another technique for us H2O which support R natively. H2O is
>> more
>> > friendly to data scientist. I saw H2O can also work on Spark (Sparkling
>> > Water).  It is better than using SparkR?
>> >
>> > Thanks and Regards.
>> >
>> > Kui
>> >
>> >
>> > On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
>> > <shivaram@eecs.berkeley.edu> wrote:
>> >
>> > Hi
>> >
>> > Do you have a specific use-case where SparkR doesn't work well ? We'd
>> love
>> > to hear more about use-cases and features that can be improved with
>> SparkR.
>> >
>> > Thanks
>> > Shivaram
>> >
>> >
>> > On Wed, Sep 3, 2014 at 3:19 AM, oppokui <oppokui@gmail.com> wrote:
>> >>
>> >> Does spark ML team have plan to support R script natively? There is a
>> >> SparkR project, but not from spark team. Spark ML used netlib-java to
>> talk
>> >> with native fortran routines or use NumPy, why not try to use R in some
>> >> sense.
>> >>
>> >> R had lot of useful packages. If spark ML team can include R support,
>> it
>> >> will be a very powerful.
>> >>
>> >> Any comment?
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> >> For additional commands, e-mail: user-help@spark.apache.org
>> >>
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Mime
View raw message