spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sun, Rui" <rui....@intel.com>
Subject RE: Do existing R packages work with SparkR data frames
Date Wed, 23 Dec 2015 07:21:10 GMT
Hi, Lan,

Generally, it is hard to use existing R packages working with R data frames to work with SparkR
data frames transparently. Typically the algorithms have to be re-written to use SparkR DataFrame
API.

Collect is for collecting the data from a SparkR DataFrame into a local data.frame. Since
a SparkR DataFrame is a distributed data set, typically you call methods of SparkR DataFrame
API to manipulate its data distributedly and after the result is enough to fit in the memory
of local machine, you can collect it for local processing.

From: Duy Lan Nguyen [mailto:ndlan2k@gmail.com]
Sent: Wednesday, December 23, 2015 5:50 AM
To: user@spark.apache.org
Subject: Do existing R packages work with SparkR data frames

Hello,

Is it possible for existing R Machine Learning packages (which work with R data frames) such
as bnlearn, to work with SparkR data frames? Or do I need to convert SparkR data frames to
R data frames? Is "collect" the function to do the conversion, or how else to do that?

Many Thanks,
Lan
Mime
View raw message