spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evo Eftimov <>
Subject R "on spark"
Date Sat, 27 Jun 2015 11:33:08 GMT
I had a look at the new R "on Spark" API / Feature in Spark 1.4.0

For those "skilled in the art" (of R and distributed computing) it will be
immediately clear that "ON" is a marketing ploy and what it actually is is
"TO" ie Spark 1.4.0 offers INTERFACE from R TO DATA stored in Spark in
distributed fashion and some distributed queries which can be initiated FROM
R and run on that data within Spark - these are essentially certain types of
SQL style queries 

In order to deserve the "ON" label, RSpark has to be able to run ON Spark
most of the Statistical Analysis and Machine Learning Algos as found in the
R engine. This is absolutely not the case at the moment.

As an example of what type of Solution/Architecture I am referring to you
can review Revolution Analytics (recently acquired by Microsoft) and some
other open source frameworks for running R ON distributed clusters 


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message