spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergei Lebedev (JIRA)" <>
Subject [jira] [Created] (SPARK-22227) DiskBlockManager.getAllBlocks could fail if called during shuffle
Date Mon, 09 Oct 2017 16:43:00 GMT
Sergei Lebedev created SPARK-22227:

             Summary: DiskBlockManager.getAllBlocks could fail if called during shuffle
                 Key: SPARK-22227
             Project: Spark
          Issue Type: Bug
          Components: Block Manager
    Affects Versions: 2.2.0
            Reporter: Sergei Lebedev
            Priority: Minor

{{DiskBlockManager.getAllBlocks}} assumes that the directories managed by the block manager
only contains files corresponding to "valid" block IDs, i.e. those parsable via {{BlockId.apply}}.
This is not always the case as demonstrated by the following snippet

object GetAllBlocksFailure {
  def main(args: Array[String]): Unit = {
    val sc = new SparkContext(new SparkConf()

    new Thread {
      override def run(): Unit = {
        while (true) {

    val rdd = sc.range(1, 65536, numSlices = 10)
        .map(x => (x % 4096, x))
        .reduceByKey { _ + _ }

We have a thread computing the number of bytes occupied by the block manager on-disk and it
frequently crashes due to this assumption being violated. Relevant part of the stacktrace

2017-10-06 11:20:14,287 ERROR  org.apache.spark.util.SparkUncaughtExceptionHandler: Uncaught
exception in thread Thread[CoarseGrainedExecutorBackend-stop-executor,5,main]
java.lang.IllegalStateException: Unrecognized BlockId:
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:73)
        at scala.collection.TraversableLike$

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message