after a couple of tests, I find that, if I use:

val result = model.predict(prdctpairs)
    result.map(x => x.user+","+x.product+","+x.rating).saveAsTextFile(output)

it always fails with above error and the exception seems iterative.

but if I do:

val result = model.predict(prdctpairs)
result.cach()
    result.map(x => x.user+","+x.product+","+x.rating).saveAsTextFile(output)

it succeeds.

could anyone help explain why the cach() is necessary?

thanks



On Fri, May 9, 2014 at 6:45 PM, phoenix bai <mingzhibai@gmail.com> wrote:
Hi all,

My spark code is running on yarn-standalone.

the last three lines of the code as below, 

    val result = model.predict(prdctpairs)
    result.map(x => x.user+","+x.product+","+x.rating).saveAsTextFile(output)
    sc.stop()

the same code, sometimes be able to run successfully and could give out the right result, while from time to time, it throws StackOverflowError and fail.

and  I don`t have a clue how I should debug.

below is the error, (the start and end portion to be exact):


14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-17] MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 44 to spark@rxxxxxx43.mc10.site.net:43885
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-17] MapOutputTrackerMaster: Size of output statuses for shuffle 44 is 148 bytes
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-35] MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 45 to spark@rxxxxxx43.mc10.site.net:43885
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-35] MapOutputTrackerMaster: Size of output statuses for shuffle 45 is 453 bytes
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-20] MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 44 to spark@rxxxxxx43.mc10.site.net:56767
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-29] MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 45 to spark@rxxxxxx43.mc10.site.net:56767
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-29] MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 44 to spark@rxxxxxx43.mc10.site.net:49879
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-29] MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 45 to spark@rxxxxxx43.mc10.site.net:49879
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-17] TaskSetManager: Starting task 946.0:17 as TID 146 on executor 6: rxxxxx15.mc10.site.net (PROCESS_LOCAL)
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-17] TaskSetManager: Serialized task 946.0:17 as 6414 bytes in 0 ms
14-05-09 17:55:51 WARN [Result resolver thread-0] TaskSetManager: Lost TID 133 (task 946.0:4)
14-05-09 17:55:51 WARN [Result resolver thread-0] TaskSetManager: Loss was due to java.lang.StackOverflowError
java.lang.StackOverflowError
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)

............................................

at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-5] TaskSetManager: Starting task 946.0:4 as TID 147 on executor 6: rxxxx15.mc10.site.net (PROCESS_LOCAL)
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-5] TaskSetManager: Serialized task 946.0:4 as 6414 bytes in 0 ms
14-05-09 17:55:51 WARN [Result resolver thread-1] TaskSetManager: Lost TID 139 (task 946.0:10)
14-05-09 17:55:51 INFO [Result resolver thread-1] TaskSetManager: Loss was due to java.lang.StackOverflowError [duplicate 1]
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-5] CoarseGrainedSchedulerBackend: Executor 4 disconnected, so removing it
14-05-09 17:55:51 ERROR [spark-akka.actor.default-dispatcher-5] YarnClusterScheduler: Lost executor 4 on rxxxxx01.mc10.site.net: remote Akka client disassociated
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-5] TaskSetManager: Re-queueing tasks for 4 from TaskSet 992.0

did anyone have a similar issue? 
Or anyone could provide a clue about where I should start looking?

thanks in advance!