oh I see, you are defining your own RDD & Partition types, and you had a bug where partition.index did not line up with the partitions slot in rdd.getPartitions.  Is that correct?

On Thu, Aug 13, 2015 at 2:40 AM, Akhil Das <akhil@sigmoidanalytics.com> wrote:
I figured that out, And these are my findings:

-> It just enters in an infinite loop when there's a duplicate partition id. 
-> It enters in an infinite loop when the partition id starts from 1 rather than 0

Something like this piece of code can reproduce it: (in getPartitions())

val total_partitions = 4
val partitionsArray: Array[Partition] = Array.ofDim[Partition](total_partitions)

var i = 0

for(outer <- 0 to 1){
  for(partition <- 1 to total_partitions){
    partitionsArray(i) = new DeadLockPartitions(partition)
    i = i + 1
  }
}

partitionsArray



Thanks
Best Regards

On Wed, Aug 12, 2015 at 10:57 PM, Imran Rashid <irashid@cloudera.com> wrote:
yikes.

Was this a one-time thing?  Or does it happen consistently?  can you turn on debug logging for o.a.s.scheduler (dunno if it will help, but maybe ...)

On Tue, Aug 11, 2015 at 8:59 AM, Akhil Das <akhil@sigmoidanalytics.com> wrote:
Hi

My Spark job (running in local[*] with spark 1.4.1) reads data from a thrift server(Created an RDD, it will compute the partitions in getPartitions() call and in computes hasNext will return records from these partitions), count(), foreach() is working fine it returns the correct number of records. But whenever there is shuffleMap stage (like reduceByKey etc.) then all the tasks are executing properly but it enters in an infinite loop saying : 

  1. 15/08/11 13:05:54 INFO DAGScheduler: Resubmitting ShuffleMapStage 1 (map at FilterMain.scala:59) because some of its tasks had failed: 03

Here's the complete stack-trace http://pastebin.com/hyK7cG8S

What could be the root cause of this problem? I looked up and bumped into this closed JIRA (which is very very old)




Thanks
Best Regards