I can’t reproduce your issue with len=10000 in local mode.
Could you give more environment information?
On Aug 9, 2016, at 11:35, Shane Lee <shane_y_lee@yahoo.com.INVALID> wrote:

Hi All,

I am trying out SparkR 2.0 and have run into an issue with repartition

Here is the R code (essentially a port of the pi-calculating scala example in the spark package) that can reproduce the behavior:

schema <- structType(structField("input", "integer"), 
    structField("output", "integer"))

library(magrittr)

len = 3000
data.frame(n = 1:len) %>%
    as.DataFrame %>%
    SparkR:::repartition(10L) %>%
dapply(., function (df)
{
library(plyr)
ddply(df, .(n), function (y)
{
data.frame(z = 
{
x1 = runif(1) * 2 - 1
y1 = runif(1) * 2 - 1
z = x1 * x1 + y1 * y1
if (z < 1)
{
1L
}
else
{
0L
}
})
})
}
, schema
) %>% 
SparkR:::summarize(total = sum(.$output)) %>% collect * 4 / len

For me it runs fine as long as len is less than 5000, otherwise it errors out with the following message:

Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 56.0 failed 4 times, most recent failure: Lost task 6.3 in stage 56.0 (TID 899, LARBGDV-VM02): org.apache.spark.SparkException: R computation failed with
 Error in readBin(con, raw(), stringLen, endian = "big") : 
  invalid 'n' argument
Calls: <Anonymous> -> readBin
Execution halted
at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108)
at org.apache.spark.sql.execution.r.MapPartitionsRWrapper.apply(MapPartitionsRWrapper.scala:59)
at org.apache.spark.sql.execution.r.MapPartitionsRWrapper.apply(MapPartitionsRWrapper.scala:29)
at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:178)
at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:175)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$

If the repartition call is removed, it runs fine again, even with very large len.

After looking through the documentations and searching the web, I can't seem to find any clues how to fix this. Anybody has seen similary problem?

Thanks in advance for your help.

Shane