spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: Spark runs into an Infinite loop even if the tasks are completed successfully
Date Fri, 14 Aug 2015 06:42:58 GMT
Yep, and it works fine for operations which does not involve any shuffle
(like foreach,, count etc) and those which involves shuffle operations ends
up in an infinite loop. Spark should somehow indicate this instead of going
in an infinite loop.

Thanks
Best Regards

On Thu, Aug 13, 2015 at 11:37 PM, Imran Rashid <irashid@cloudera.com> wrote:

> oh I see, you are defining your own RDD & Partition types, and you had a
> bug where partition.index did not line up with the partitions slot in
> rdd.getPartitions.  Is that correct?
>
> On Thu, Aug 13, 2015 at 2:40 AM, Akhil Das <akhil@sigmoidanalytics.com>
> wrote:
>
>> I figured that out, And these are my findings:
>>
>> -> It just enters in an infinite loop when there's a duplicate partition
>> id.
>>
>> -> It enters in an infinite loop when the partition id starts from 1
>> rather than 0
>>
>>
>> Something like this piece of code can reproduce it: (in getPartitions())
>>
>> val total_partitions = 4
>> val partitionsArray: Array[Partition] =
>> Array.ofDim[Partition](total_partitions)
>>
>> var i = 0
>>
>> for(outer <- 0 to 1){
>>   for(partition <- 1 to total_partitions){
>>     partitionsArray(i) = new DeadLockPartitions(partition)
>>     i = i + 1
>>   }
>> }
>>
>> partitionsArray
>>
>>
>>
>>
>> Thanks
>> Best Regards
>>
>> On Wed, Aug 12, 2015 at 10:57 PM, Imran Rashid <irashid@cloudera.com>
>> wrote:
>>
>>> yikes.
>>>
>>> Was this a one-time thing?  Or does it happen consistently?  can you
>>> turn on debug logging for o.a.s.scheduler (dunno if it will help, but maybe
>>> ...)
>>>
>>> On Tue, Aug 11, 2015 at 8:59 AM, Akhil Das <akhil@sigmoidanalytics.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> My Spark job (running in local[*] with spark 1.4.1) reads data from a
>>>> thrift server(Created an RDD, it will compute the partitions in
>>>> getPartitions() call and in computes hasNext will return records from these
>>>> partitions), count(), foreach() is working fine it returns the correct
>>>> number of records. But whenever there is shuffleMap stage (like reduceByKey
>>>> etc.) then all the tasks are executing properly but it enters in an
>>>> infinite loop saying :
>>>>
>>>>
>>>>    1. 15/08/11 13:05:54 INFO DAGScheduler: Resubmitting
>>>>    ShuffleMapStage 1 (map at FilterMain.scala:59) because some of its
>>>>    tasks had failed: 0, 3
>>>>
>>>>
>>>> Here's the complete stack-trace http://pastebin.com/hyK7cG8S
>>>>
>>>> What could be the root cause of this problem? I looked up and bumped
>>>> into this closed JIRA <https://issues.apache.org/jira/browse/SPARK-583>
>>>> (which is very very old)
>>>>
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Best Regards
>>>>
>>>
>>>
>>
>

Mime
View raw message