spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sujeet jog <>
Subject Run External R script from Spark
Date Mon, 21 Mar 2016 06:09:44 GMT

I have been working on a POC on some time series related stuff, i'm using
python since i need spark streaming and sparkR is yet to have a spark
streaming front end,  couple of algorithms i want to use are not yet
present in Spark-TS package, so I'm thinking of invoking a external R
script for the Algorithm part & pass the data from Spark to the R script
via pipeRdd,

What i wanted to understand is can something like this be used in a
production deployment,  since passing the data via R script would mean lot
of serializing and would actually not use the power of spark for parallel

Has anyone used this kind of workaround  Spark -> pipeRdd-> R script.


View raw message