spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shangyu Luo <lsy...@gmail.com>
Subject Re: cluster hangs for no apparent reason
Date Sun, 03 Nov 2013 18:47:58 GMT
Hi Walrus,
Thank you for sharing your solution to your problem. I think I have met the
similar problem before (i.e., one machine is working while others are
idle.) and I just waits for a long time and the program will continue
processing. I am not sure how your program filters an RDD by a locally
stored set. If the set is a parameter of a function, I assume it should be
copied to all worker nodes. But it is good that you solved your problem
with a broadcast variable and the running time seems reasonable!


2013/11/3 Walrus theCat <walrusthecat@gmail.com>

> Hi Shangyu,
>
> Thanks for responding.  This is a refactor of other code that isn't
> completely scalable because it pulls stuff to the driver.  This code keeps
> everything on the cluster.  I left it running for 7 hours, and the log just
> froze.  I checked ganglia, and only one machine's CPU seemed to be doing
> anything.  The last output on the log left my code at a spot where it is
> filtering an RDD by a locally stored set.  No error was thrown.  I thought
> that was OK based on the example code, but just in case, I changed it so
> it's a broadcast variable.  The un-refactored code (that pulls all the data
> to the driver from time to time) runs in minutes.  I've never had the
> problem before of the log just getting non-responsive, and was wondering if
> anyone knew of any heuristics I could check.
>
> Thank you
>
>
> On Sat, Nov 2, 2013 at 2:55 PM, Shangyu Luo <lsyurd@gmail.com> wrote:
>
>> Yes, I think so. The running time depends on what work your are doing and
>> how large it is.
>>
>>
>> 2013/11/1 Walrus theCat <walrusthecat@gmail.com>
>>
>>> That's what I thought, too.  So is it not "hanging", just recalculating
>>> for a very long time?  The log stops updating and it just gives the output
>>> I posted.  If there are any suggestions as to parameters to change, or any
>>> other data, it would be appreciated.
>>>
>>> Thank you, Shangyu.
>>>
>>>
>>> On Fri, Nov 1, 2013 at 11:31 AM, Shangyu Luo <lsyurd@gmail.com> wrote:
>>>
>>>> I think the missing parent may be not abnormal. From my understanding,
>>>> when a Spark task cannot find its parent, it can use some meta data to find
>>>> the result of its parent or recalculate its parent's value. Imaging in a
>>>> loop, a Spark task tries to find some value from the last iteration's
>>>> result.
>>>>
>>>>
>>>> 2013/11/1 Walrus theCat <walrusthecat@gmail.com>
>>>>
>>>>> Are there heuristics to check when the scheduler says it is "missing
>>>>> parents" and just hangs?
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 31, 2013 at 4:56 PM, Walrus theCat <walrusthecat@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm not sure what's going on here.  My code seems to be working thus
>>>>>> far (map at SparkLR:90 completed.)  What can I do to help the scheduler
out
>>>>>> here?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: Completed
>>>>>> ShuffleMapTask(10, 211)
>>>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: Stage 10 (map at
>>>>>> SparkLR.scala:90) finished in 0.923 s
>>>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: looking for newly
>>>>>> runnable stages
>>>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: running: Set(Stage
11)
>>>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: waiting: Set(Stage
9,
>>>>>> Stage 8)
>>>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: failed: Set()
>>>>>> 13/10/31 02:10:16 INFO scheduler.DAGScheduler: Missing parents for
>>>>>> Stage 9: List(Stage 11)
>>>>>> 13/10/31 02:10:16 INFO scheduler.DAGScheduler: Missing parents for
>>>>>> Stage 8: List(Stage 9)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>>
>>>> Shangyu, Luo
>>>> Department of Computer Science
>>>> Rice University
>>>>
>>>> --
>>>> Not Just Think About It, But Do It!
>>>> --
>>>> Success is never final.
>>>> --
>>>> Losers always whine about their best
>>>>
>>>
>>>
>>
>>
>> --
>> --
>>
>> Shangyu, Luo
>> Department of Computer Science
>> Rice University
>>
>> --
>> Not Just Think About It, But Do It!
>> --
>> Success is never final.
>> --
>> Losers always whine about their best
>>
>
>


-- 
--

Shangyu, Luo
Department of Computer Science
Rice University

--
Not Just Think About It, But Do It!
--
Success is never final.
--
Losers always whine about their best

Mime
View raw message