spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Cheung <>
Subject Re: Spark R - Loading Third Party R Library in YARN Executors
Date Wed, 17 Aug 2016 11:16:40 GMT
When you call library(), that is the library loading function in native R. As of now it does
not support HDFS but there are several packages out there that might help.

Another approach is to have a prefetch/installation mechanism to call HDFS command to download
the R package from HDFS onto the worker node first.

From: Senthil Kumar <<>>
Sent: Wednesday, August 17, 2016 2:23 AM
Subject: Spark R - Loading Third Party R Library in YARN Executors
To: Senthil kumar <<>>, <<>>,
<<>>, <<>>

Hi All ,  We are using Spark 1.6 Version R library .. Below is our code which Loads the THIRD
Party Library .

library("BreakoutDetection", lib.loc = "hdfs://xxxxxx/BreakoutDetection/") :
library("BreakoutDetection", lib.loc = "//xxxxxx/BreakoutDetection/") :

When i try to execute the code using LOCAL Mode , Spark R code is Working fine without any
issue . If i submit the Job in Cluster , we will end up with error.

error in evaluating the argument 'X' in selecting a method for function 'lapply': Error in
library("BreakoutDetection", lib.loc = "hdfs://xxxxxxx/BreakoutDetection/") :
  no library trees found in 'lib.loc'
Calls: f ... lapply -> FUN -> mainProcess -> angleValid -> library

Can't we read libraries in R as below ?
library("BreakoutDetection", lib.loc = "hdfs://xxxxxx/BreakoutDetection/") :

If not what is the other way to solve this problem ?

Since our cluster having close to 2500 nodes we cant copy the Third Party Libs to all nodes
.. Copying to all DNs is not good practice too ..

Can someone help me here How to load R libs from HDFS or any other way  ?


View raw message