spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sun, Rui" <rui....@intel.com>
Subject RE: SparkR job with >200 tasks hangs when calling from web server
Date Mon, 02 Nov 2015 07:53:30 GMT
I guess that this is not related to SparkR, but something wrong in the Spark Core.

Could you try your application logic within spark-shell (you have to use Scala DataFrame API)
instead of SparkR shell and to see if this issue still happens?

-----Original Message-----
From: rporcio [mailto:rporcio@gmail.com] 
Sent: Friday, October 30, 2015 11:09 PM
To: user@spark.apache.org
Subject: SparkR job with >200 tasks hangs when calling from web server

Hi,

I have a web server which can execute R codes using SparkR.
The R session is created with the Rscript init.R command where the /init.R/ file contains
a sparkR initialization section:

/library(SparkR, lib.loc = paste("/opt/Spark/spark-1.5.1-bin-hadoop2.6",
"R", "lib", sep = "/"))
sc <<- sparkR.init(master = "local[4]", appName = "TestR", sparkHome = "/opt/Spark/spark-1.5.1-bin-hadoop2.6",
sparkPackages =
"com.databricks:spark-csv_2.10:1.2.0")
sqlContext <<- sparkRSQL.init(sc)/

I have the below example R code that I want to execute (flights.csv comes from SparkR examples):

/df <- read.df(sqlContext, "/opt/Spark/flights.csv", source = "com.databricks.spark.csv",
header="true") registerTempTable(df, "flights") depDF <- sql(sqlContext, "SELECT dep FROM
flights") deps <- collect(depDF)/

If I run this code, it is successfully executed . When I check the Spark UI, I see that the
belonging job has 2 tasks only.

But if I change the first row to
/df <- repartition(read.df(sqlContext, "/opt/Spark/flights.csv", source = "com.databricks.spark.csv",
header="true"), 200)/ and execute the R code again, the belonging job has 202 tasks from which
it sucessfully finishes some (like 132/202) but then it hangs forever.

If I check the /stderr/ of the executor I can see that the executor can't communicate with
the driver:

/15/10/30 15:34:24 WARN AkkaRpcEndpointRef: Error sending message [message = Heartbeat(0,[Lscala.Tuple2;@36834e15,BlockManagerId(0,
192.168.178.198, 7092))] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [30 seconds]. This timeout
is controlled by spark.rpc.askTimeout/

I tried to change memory (e.g. 4g to driver), akka and timeout settings but with no luck.

Executing the same code (with the repartition part) from R, it successfully finishes, so I
assume the problem is related somehow to the webserver, but I can't figure it out.

I'm using Centos.

Can someone give me some advice what should I try?

Thanks





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-job-with-200-tasks-hangs-when-calling-from-web-server-tp25237.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail:
user-help@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message