spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milos Nikolic <>
Subject TorrentBroadcast + persist = bug
Date Mon, 20 Jan 2014 11:22:22 GMT

I think there is a bug with TorrentBroadcast in the latest release (0.8.1). The problem is
that even a simple job (e.g., rdd.count) hangs waiting for some tasks to finish. Here is how
to reproduce the problem:

1) Configure Spark such that node X is the master and also one of the workers (e.g., 5 nodes
=> 5 workers and 1 master)
2) Activate TorrentBroadcast
3) Use Kryo serializer (the problem happens more often than with Java serializer)
4) Read some file from HDFS, persist RDD, and call count

In almost 80% of the cases (~50% with Java serializer), the count job hangs waiting for two
tasks from node X to finish. The problem *does not* appear if: 1) I separate the master from
the worker nodes, or 2) I use HttpBroadcast, or 3) I do not persist the RDD.

The code is below.

  def main(args: Array[String]): Unit = {

    System.setProperty("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    System.setProperty("spark.kryo.registrator", "test.MyRegistrator")
    System.setProperty("spark.broadcast.factory", "org.apache.spark.broadcast.TorrentBroadcastFactory")
    val sc = new SparkContext(...)   
    val file = "hdfs://server:9000/user/xxx/Test.out"  // ~750MB
    val rdd = sc.textFile(file)
    println("Counting: " + rdd.count)         

Best regards,
View raw message