spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shane Lee <shane_y_...@yahoo.com.INVALID>
Subject SparkR error when repartition is called
Date Tue, 09 Aug 2016 03:35:17 GMT
Hi All,
I am trying out SparkR 2.0 and have run into an issue with repartition. 
Here is the R code (essentially a port of the pi-calculating scala example in the spark package)
that can reproduce the behavior:
schema <- structType(structField("input", "integer"),     structField("output", "integer"))
library(magrittr)

len = 3000data.frame(n = 1:len) %>%    as.DataFrame %>%    SparkR:::repartition(10L)
%>% dapply(., function (df) { library(plyr) ddply(df, .(n), function (y)
 { data.frame(z =  { x1 = runif(1) * 2 - 1 y1 = runif(1) * 2 - 1 z = x1 * x1 + y1 * y1 if
(z < 1) { 1L } else { 0L } }) }) } , schema ) %>%  SparkR:::summarize(total = sum(.$output))
%>% collect * 4 / len
For me it runs fine as long as len is less than 5000, otherwise it errors out with the following
message:
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :   org.apache.spark.SparkException:
Job aborted due to stage failure: Task 6 in stage 56.0 failed 4 times, most recent failure:
Lost task 6.3 in stage 56.0 (TID 899, LARBGDV-VM02): org.apache.spark.SparkException: R computation
failed with Error in readBin(con, raw(), stringLen, endian = "big") :   invalid 'n' argumentCalls:
<Anonymous> -> readBinExecution halted at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108)
at org.apache.spark.sql.execution.r.MapPartitionsRWrapper.apply(MapPartitionsRWrapper.scala:59)
at org.apache.spark.sql.execution.r.MapPartitionsRWrapper.apply(MapPartitionsRWrapper.scala:29)
at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:178) at
org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:175) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$
If the repartition call is removed, it runs fine again, even with very large len.
After looking through the documentations and searching the web, I can't seem to find any clues
how to fix this. Anybody has seen similary problem?
Thanks in advance for your help.
Shane

Mime
View raw message