Hi Kui, sorry about that. That link you mentioned is probably the one for the products. We don't have one pointing from adatao.com to ddf.io; maybe we'll add it.

As for access to the code base itself, I think the team has already created a GitHub repo for it, and should open it up within a few weeks. There's some debate about whether to put out the implementation with Shark dependencies now, or SparkSQL with a bit limited functionality and not as well tested.

I'll check and ping when this is opened up.

The license is Apache.

Sent while mobile. Please excuse typos etc.

On Sep 6, 2014 1:39 PM, "oppokui" <oppokui@gmail.com> wrote:
Thanks, Christopher. I saw it before, it is amazing. Last time I try to download it from adatao, but no response after filling the table. How can I download it or its source code? What is the license?


On Sep 6, 2014, at 8:08 PM, Christopher Nguyen <ctn@adatao.com> wrote:

Hi Kui,

DDF (open sourced) also aims to do something similar, adding RDBMS idioms, and is already implemented on top of Spark.

One philosophy is that the DDF API aggressively hides the notion of parallel datasets, exposing only (mutable) tables to users, on which they can apply R and other familiar data mining/machine learning idioms, without having to know about the distributed representation underneath. Now, you can get to the underlying RDDs if you want to, simply by asking for it.

This was launched at the July Spark Summit. See http://spark-summit.org/2014/talk/distributed-dataframe-ddf-on-apache-spark-simplifying-big-data-for-the-rest-of-us .

Sent while mobile. Please excuse typos etc.

On Sep 4, 2014 1:59 PM, "Shivaram Venkataraman" <shivaram@eecs.berkeley.edu> wrote:
Thanks Kui. SparkR is a pretty young project, but there are a bunch of
things we are working on. One of the main features is to expose a data
frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
be integrating this with Spark's MLLib.  At a high-level this will
allow R users to use a familiar API but make use of MLLib's efficient
distributed implementation. This is the same strategy used in Python
as well.

Also we do hope to merge SparkR with mainline Spark -- we have a few
features to complete before that and plan to shoot for integration by
Spark 1.3.


On Wed, Sep 3, 2014 at 9:24 PM, oppokui <oppokui@gmail.com> wrote:
> Thanks, Shivaram.
> No specific use case yet. We try to use R in our project as data scientest
> are all knowing R. We had a concern that how R handles the mass data. Spark
> does a better work on big data area, and Spark ML is focusing on predictive
> analysis area. Then we are thinking whether we can merge R and Spark
> together. We tried SparkR and it is pretty easy to use. But we didn’t see
> any feedback on this package in industry. It will be better if Spark team
> has R support just like scala/Java/Python.
> Another question is that MLlib will re-implement all famous data mining
> algorithms in Spark, then what is the purpose of using R?
> There is another technique for us H2O which support R natively. H2O is more
> friendly to data scientist. I saw H2O can also work on Spark (Sparkling
> Water).  It is better than using SparkR?
> Thanks and Regards.
> Kui
> On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
> <shivaram@eecs.berkeley.edu> wrote:
> Hi
> Do you have a specific use-case where SparkR doesn't work well ? We'd love
> to hear more about use-cases and features that can be improved with SparkR.
> Thanks
> Shivaram
> On Wed, Sep 3, 2014 at 3:19 AM, oppokui <oppokui@gmail.com> wrote:
>> Does spark ML team have plan to support R script natively? There is a
>> SparkR project, but not from spark team. Spark ML used netlib-java to talk
>> with native fortran routines or use NumPy, why not try to use R in some
>> sense.
>> R had lot of useful packages. If spark ML team can include R support, it
>> will be a very powerful.
>> Any comment?
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org

To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org