spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Sloane <andy.slo...@gmail.com>
Subject getPreferredLocations race condition in spark 1.6.0?
Date Wed, 02 Mar 2016 23:46:27 GMT
We are seeing something that looks a lot like a regression from spark 1.2.
When we run jobs with multiple threads, we have a crash somewhere inside
getPreferredLocations, as was fixed in SPARK-4454. Except now it's inside
org.apache.spark.MapOutputTrackerMaster.getLocationsWithLargestOutputs
instead of DAGScheduler directly.

I tried Spark 1.2 post-SPARK-4454 (before this patch it's only slightly
flaky), 1.4.1, and 1.5.2 and all are fine. 1.6.0 immediately crashes on our
threaded test case, though once in a while it passes.

The stack trace is huge, but starts like this:

Caused by: java.lang.NullPointerException: null
at
org.apache.spark.MapOutputTrackerMaster.getLocationsWithLargestOutputs(MapOutputTracker.scala:406)
at
org.apache.spark.MapOutputTrackerMaster.getPreferredLocationsForShuffle(MapOutputTracker.scala:366)
at
org.apache.spark.rdd.ShuffledRDD.getPreferredLocations(ShuffledRDD.scala:92)
at
org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:257)
at
org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:257)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:256)
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1545)

The full trace is available here:
https://gist.github.com/andy256/97611f19924bbf65cf49

Does this ring any bells? I will attempt to nail down the commit with git
bisect next.

Thanks
-Andy

Mime
View raw message