asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Westmann" <ti...@apache.org>
Subject Re: Requesting node X to shutdown to ensure failure?
Date Sun, 30 Sep 2018 20:15:54 GMT
Yes, and after 10 * 8000 ms = 80s have passed without a heartbeat the 
node is considered dead. This seems to align with the log messages:

21:51:54.203 [Worker:ClusterController] INFO 
org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 2 to 
shutdown to ensure failure
21:51:54.206 [Worker:ClusterController] INFO 
org.apache.hyracks.control.cc.cluster.NodeManager - Request to shutdown 
failed node 2 succeeded. false positive heartbeat miss indication
21:51:54.206 [Worker:ClusterController] INFO 
org.apache.hyracks.control.cc.cluster.NodeManager - 2 considered dead. 
Last heartbeat received 83799ms ago. Max miss period: 80000ms

So that looks consistent.
The question seems to be (unless I missed something) why no heartbeats 
were received for 80s.

And it also looks like this happened for more than one node at the same 
time. Are these nodes running on a single box or on multiple boxes?

Cheers,
Till

On 30 Sep 2018, at 13:03, Taewoo Kim wrote:

> P.S. We set the heartbeat setting as follows.
>
> [cc]
> heartbeat.max.misses=10
> heartbeat.period=8000
>
> Best,
> Taewoo
>
> On Sun, Sep 30, 2018 at 12:50 PM, Taewoo Kim <wangsaeu@gmail.com> 
> wrote:
>
>> Hi Till,
>>
>> These files are indeed the same log file. I just extracted one line 
>> from
>> the original log and placed the line at the beginning. At 9:50 pm, 
>> the CC
>> initiates a task cluster and all NCs were executing. Node 2 sent the
>> taskcomplete message, and somehow the CC decides to shut down the 
>> node 2.
>> That's my understanding.
>>
>> Best,
>> Taewoo
>>
>> On Sun, Sep 30, 2018 at 12:40 PM, Till Westmann <tillw@apache.org> 
>> wrote:
>>
>>> Hi Taewoo,
>>>
>>> I’m little confused as we seem to have 2 cc logs.
>>>
>>> The "Requesting node 2 to shutdown …" message is logged, when 
>>> nodes are
>>> considered to be dead as they have missed too many heartbeats. And 
>>> you can
>>> see in the later logs that indeed more than 80s passed without 
>>> receiving a
>>> heartbeat.
>>> Could it be that the machines were too busy to send/receive 
>>> heartbeats?
>>>
>>> Cheers,
>>> Till
>>>
>>> On 30 Sep 2018, at 10:52, Taewoo Kim wrote:
>>>
>>> Hi devs,
>>>>
>>>> I saw the following log messages in the cc.log of the Cloudberrry
>>>> cluster.
>>>> Yesterday around 9:50 pm, a task cluster was created and executed. 
>>>> And
>>>> around 9:51 pm, the cluster shows the following message. In the 
>>>> nc.log of
>>>> the node 2, there were no noticeable log messages. I wonder what 
>>>> could
>>>> cause this situation.
>>>>
>>>> -- cc.log at 9:51 pm
>>>> *21:51:54.203 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.cluster.NodeManager - Requesting 
>>>> node 2
>>>> to
>>>> shutdown to ensure failure*
>>>>
>>>> *-- nc-2.log around 9:50 pm*
>>>>
>>>> 21:50:30.314 [Worker:2] INFO
>>>>  org.apache.hyracks.control.nc.work.StartTasksWork - input: 0: 
>>>> CDID:2
>>>> 21:50:30.375 [Worker:2] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:7:0:3:0
>>>> 21:50:30.382 [Worker:2] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:9:0:3:0
>>>> 21:50:30.390 [Worker:2] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:9:0:2:0
>>>> 21:50:30.391 [Worker:2] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:7:0:2:0
>>>> 21:50:30.392 [Worker:2] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:1:1:3:0
>>>> 21:50:30.392 [Worker:2] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:1:1:2:0
>>>> 21:50:30.398 [Worker:2] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:1:1:3:0
>>>> 21:50:30.398 [Worker:2] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:1:1:2:0
>>>> 22:02:30.290 [JVM exit thread] INFO  
>>>> org.apache.hyracks.util.ExitUtil -
>>>> JVM
>>>> exiting with status 0; bye!
>>>>
>>>>
>>>> -- *cc.log from 9:51 pm*
>>>>
>>>> 21:50:30.313 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - 
>>>> Built 1
>>>> Task Clusters
>>>> 21:50:30.313 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - 
>>>> Tasks:
>>>> [TID:ANID:ODID:1:1:0, TID:ANID:ODID:1:1:1, TID:ANID:ODID:1:1:2,
>>>> TID:ANID:ODID:1:1:3, TID:ANID:ODID:1:1:4, TID:ANID:ODID:1:1:5,
>>>> TID:ANID:ODID:1:1:6, TID:ANID:ODID:1:1:7, TID:ANID:ODID:1:1:8,
>>>> TID:ANID:ODID:1:1:9, TID:ANID:ODID:2:0:0, TID:ANID:ODID:2:0:1,
>>>> TID:ANID:ODID:2:0:2, TID:ANID:ODID:2:0:3, TID:ANID:ODID:2:0:4,
>>>> TID:ANID:ODID:2:0:5, TID:ANID:ODID:2:0:6, TID:ANID:ODID:2:0:7,
>>>> TID:ANID:ODID:2:0:8, TID:ANID:ODID:2:0:9, TID:ANID:ODID:9:0:0,
>>>> TID:ANID:ODID:9:0:1, TID:ANID:ODID:9:0:2, TID:ANID:ODID:9:0:3,
>>>> TID:ANID:ODID:9:0:4, TID:ANID:ODID:9:0:5, TID:ANID:ODID:9:0:6,
>>>> TID:ANID:ODID:9:0:7, TID:ANID:ODID:9:0:8, TID:ANID:ODID:9:0:9]
>>>> 21:50:30.313 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.executor.JobExecutor - Runnable TC 
>>>> roots:
>>>> [TC:[TID:ANID:ODID:1:1:0, TID:ANID:ODID:1:1:1, TID:ANID:ODID:1:1:2,
>>>> TID:ANID:ODID:1:1:3, TID:ANID:ODID:1:1:4, TID:ANID:ODID:1:1:5,
>>>> TID:ANID:ODID:1:1:6, TID:ANID:ODID:1:1:7, TID:ANID:ODID:1:1:8,
>>>> TID:ANID:ODID:1:1:9, TID:ANID:ODID:2:0:0, TID:ANID:ODID:2:0:1,
>>>> TID:ANID:ODID:2:0:2, TID:ANID:ODID:2:0:3, TID:ANID:ODID:2:0:4,
>>>> TID:ANID:ODID:2:0:5, TID:ANID:ODID:2:0:6, TID:ANID:ODID:2:0:7,
>>>> TID:ANID:ODID:2:0:8, TID:ANID:ODID:2:0:9, TID:ANID:ODID:9:0:0,
>>>> TID:ANID:ODID:9:0:1, TID:ANID:ODID:9:0:2, TID:ANID:ODID:9:0:3,
>>>> TID:ANID:ODID:9:0:4, TID:ANID:ODID:9:0:5, TID:ANID:ODID:9:0:6,
>>>> TID:ANID:ODID:9:0:7, TID:ANID:ODID:9:0:8, TID:ANID:ODID:9:0:9]],
>>>> inProgressTaskClusters: []
>>>> 21:50:30.366 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:7:0:5:0]
>>>> 21:50:30.374 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [5[JID:0.700:TAID:TID:ANID:ODID:7:0:9:0]
>>>> 21:50:30.376 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [2[JID:0.700:TAID:TID:ANID:ODID:7:0:3:0]
>>>> 21:50:30.376 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [5[JID:0.700:TAID:TID:ANID:ODID:7:0:8:0]
>>>> 21:50:30.377 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [1[JID:0.700:TAID:TID:ANID:ODID:7:0:0:0]
>>>> 21:50:30.378 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:7:0:4:0]
>>>> 21:50:30.381 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [1[JID:0.700:TAID:TID:ANID:ODID:7:0:1:0]
>>>> 21:50:30.382 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [5[JID:0.701:TAID:TID:ANID:ODID:9:0:8:0]
>>>> 21:50:30.383 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [2[JID:0.701:TAID:TID:ANID:ODID:9:0:3:0]
>>>> 21:50:30.384 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [3[JID:0.701:TAID:TID:ANID:ODID:9:0:4:0]
>>>> 21:50:30.386 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [3[JID:0.701:TAID:TID:ANID:ODID:9:0:5:0]
>>>> 21:50:30.386 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [5[JID:0.701:TAID:TID:ANID:ODID:9:0:9:0]
>>>> 21:50:30.389 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [4[JID:0.701:TAID:TID:ANID:ODID:9:0:6:0]
>>>> 21:50:30.389 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [1[JID:0.701:TAID:TID:ANID:ODID:9:0:0:0]
>>>> 21:50:30.392 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [2[JID:0.701:TAID:TID:ANID:ODID:9:0:2:0]
>>>> 21:50:30.392 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [4[JID:0.700:TAID:TID:ANID:ODID:7:0:7:0]
>>>> 21:50:30.392 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [4[JID:0.700:TAID:TID:ANID:ODID:7:0:6:0]
>>>> 21:50:30.394 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [1[JID:0.701:TAID:TID:ANID:ODID:9:0:1:0]
>>>> 21:50:30.395 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [2[JID:0.700:TAID:TID:ANID:ODID:7:0:2:0]
>>>> 21:50:30.395 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [4[JID:0.701:TAID:TID:ANID:ODID:9:0:7:0]
>>>> 21:50:30.396 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [1[JID:0.700:TAID:TID:ANID:ODID:1:1:1:0]
>>>> 21:50:30.396 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [1[JID:0.701:TAID:TID:ANID:ODID:1:1:0:0]
>>>> 21:50:30.396 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [1[JID:0.700:TAID:TID:ANID:ODID:1:1:0:0]
>>>> 21:50:30.396 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [1[JID:0.701:TAID:TID:ANID:ODID:1:1:1:0]
>>>> 21:50:30.397 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [5[JID:0.700:TAID:TID:ANID:ODID:1:1:8:0]
>>>> 21:50:30.398 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:1:1:4:0]
>>>> 21:50:30.398 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:1:1:5:0]
>>>> 21:50:30.398 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [3[JID:0.701:TAID:TID:ANID:ODID:1:1:4:0]
>>>> 21:50:30.398 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [3[JID:0.701:TAID:TID:ANID:ODID:1:1:5:0]
>>>> 21:50:30.399 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [4[JID:0.700:TAID:TID:ANID:ODID:1:1:6:0]
>>>> 21:50:30.399 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [2[JID:0.700:TAID:TID:ANID:ODID:1:1:3:0]
>>>> 21:50:30.399 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [2[JID:0.700:TAID:TID:ANID:ODID:1:1:2:0]
>>>> 21:50:30.399 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [5[JID:0.700:TAID:TID:ANID:ODID:1:1:9:0]
>>>> 21:50:30.399 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [5[JID:0.701:TAID:TID:ANID:ODID:1:1:8:0]
>>>> 21:50:30.399 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [5[JID:0.701:TAID:TID:ANID:ODID:1:1:9:0]
>>>> 21:50:30.402 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [4[JID:0.700:TAID:TID:ANID:ODID:1:1:7:0]
>>>> 21:50:30.402 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [4[JID:0.701:TAID:TID:ANID:ODID:1:1:6:0]
>>>> 21:50:30.402 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [4[JID:0.701:TAID:TID:ANID:ODID:1:1:7:0]
>>>> 21:50:30.402 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:2:0:5:0]
>>>> 21:50:30.402 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:2:0:4:0]
>>>> 21:50:30.402 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:2:0:5:0]
>>>> 21:50:30.402 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:2:0:4:0]
>>>> 21:50:30.404 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [2[JID:0.701:TAID:TID:ANID:ODID:1:1:3:0]
>>>> 21:50:30.404 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.common.work.WorkQueue - Executing:
>>>> TaskComplete: [2[JID:0.701:TAID:TID:ANID:ODID:1:1:2:0]
>>>> *21:51:54.203 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.cluster.NodeManager - Requesting 
>>>> node 2
>>>> to
>>>> shutdown to ensure failure*
>>>>
>>>> 21:51:54.206 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.cluster.NodeManager - Request to 
>>>> shutdown
>>>> failed node 2 succeeded. false positive heartbeat miss indication
>>>> 21:51:54.206 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.cluster.NodeManager - 2 considered 
>>>> dead.
>>>> Last heartbeat received 83799ms ago. Max miss period: 80000ms
>>>> 21:51:54.206 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.cluster.NodeManager - Requesting 
>>>> node 5
>>>> to
>>>> shutdown to ensure failure
>>>> 21:51:54.206 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.cluster.NodeManager - Request to 
>>>> shutdown
>>>> failed node 5 succeeded. false positive heartbeat miss indication
>>>> 21:51:54.206 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.cluster.NodeManager - 5 considered 
>>>> dead.
>>>> Last heartbeat received 83806ms ago. Max miss period: 80000ms
>>>> 21:51:54.207 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.cluster.NodeManager - Requesting 
>>>> node 3
>>>> to
>>>> shutdown to ensure failure
>>>> 21:51:54.207 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.cluster.NodeManager - Request to 
>>>> shutdown
>>>> failed node 3 succeeded. false positive heartbeat miss indication
>>>> 21:51:54.207 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.cluster.NodeManager - 3 considered 
>>>> dead.
>>>> Last heartbeat received 83804ms ago. Max miss period: 80000ms
>>>> 21:51:54.207 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.cluster.NodeManager - Requesting 
>>>> node 4
>>>> to
>>>> shutdown to ensure failure
>>>> 21:51:54.207 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.cluster.NodeManager - Request to 
>>>> shutdown
>>>> failed node 4 succeeded. false positive heartbeat miss indication
>>>> 21:51:54.207 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.cluster.NodeManager - 4 considered 
>>>> dead.
>>>> Last heartbeat received 83804ms ago. Max miss period: 80000ms
>>>> 21:51:54.207 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.work.RemoveDeadNodesWork - Number of
>>>> affected jobs: 3
>>>> 21:51:54.216 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for 
>>>> job:
>>>> JID:0.1: {1=[TAID:TID:ANID:ODID:0:0:0:0, 
>>>> TAID:TID:ANID:ODID:3:0:0:0,
>>>> TAID:TID:ANID:ODID:3:0:1:0]}
>>>> 21:51:54.216 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.executor.JobExecutor - Aborting:
>>>> [TAID:TID:ANID:ODID:0:0:0:0, TAID:TID:ANID:ODID:3:0:0:0,
>>>> TAID:TID:ANID:ODID:3:0:1:0] at 1
>>>> 21:51:54.217 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - 
>>>> Removing
>>>> uncommitted partitions: []
>>>> 21:51:54.217 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - 
>>>> Removing
>>>> partition requests: []
>>>> 21:51:54.218 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.executor.JobExecutor - aborting 
>>>> doomed
>>>> task
>>>> clusters
>>>> 21:51:54.218 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.executor.JobExecutor - number of 
>>>> doomed
>>>> task
>>>> clusters found = 0
>>>> 21:51:54.218 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.executor.JobExecutor - Runnable TC 
>>>> roots:
>>>> [TC:[TID:ANID:ODID:0:0:0, TID:ANID:ODID:3:0:0, TID:ANID:ODID:3:0:1,
>>>> TID:ANID:ODID:3:0:2, TID:ANID:ODID:3:0:3, TID:ANID:ODID:3:0:4,
>>>> TID:ANID:ODID:3:0:5, TID:ANID:ODID:3:0:6, TID:ANID:ODID:3:0:7,
>>>> TID:ANID:ODID:3:0:8, TID:ANID:ODID:3:0:9]], inProgressTaskClusters: 
>>>> []
>>>> 21:51:54.218 [Worker:ClusterController] ERROR
>>>> org.apache.hyracks.control.cc.executor.JobExecutor - Unexpected 
>>>> failure.
>>>> Aborting job JID:0.1
>>>> org.apache.hyracks.api.exceptions.HyracksException: Node 2 not live
>>>>         at
>>>> org.apache.hyracks.control.cc.executor.JobExecutor.assignLoc
>>>> ation(JobExecutor.java:473)
>>>> ~[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
>>>>         at
>>>> org.apache.hyracks.control.cc.executor.JobExecutor.assignTas
>>>> kLocations(JobExecutor.java:365)
>>>> ~[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
>>>>         at
>>>> org.apache.hyracks.control.cc.executor.JobExecutor.startRunn
>>>> ableTaskClusters(JobExecutor.java:245)
>>>> ~[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
>>>>         at
>>>> org.apache.hyracks.control.cc.executor.JobExecutor.startRunn
>>>> ableActivityClusters(JobExecutor.java:209)
>>>> ~[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
>>>>         at
>>>> org.apache.hyracks.control.cc.executor.JobExecutor.notifyNod
>>>> eFailures(JobExecutor.java:731)
>>>> [hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
>>>>         at
>>>> org.apache.hyracks.control.cc.work.RemoveDeadNodesWork.run(R
>>>> emoveDeadNodesWork.java:60)
>>>> [hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
>>>>         at
>>>> org.apache.hyracks.control.common.work.WorkQueue$WorkerThrea
>>>> d.run(WorkQueue.java:127)
>>>> [hyracks-control-common-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
>>>> 21:51:54.218 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for 
>>>> job:
>>>> JID:0.700: {}
>>>> 21:51:54.218 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - 
>>>> Removing
>>>> uncommitted partitions: []
>>>> 21:51:54.218 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - 
>>>> Removing
>>>> partition requests: []
>>>> 21:51:54.218 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.executor.JobExecutor - aborting 
>>>> doomed
>>>> task
>>>> clusters
>>>> 21:51:54.219 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.executor.JobExecutor - number of 
>>>> doomed
>>>> task
>>>> clusters found = 0
>>>> 21:51:54.219 [Worker:ClusterController] INFO
>>>>  org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for 
>>>> job:
>>>> JID:0.700: {1=[TAID:TID:ANID:ODID:2:0:0:0, 
>>>> TAID:TID:ANID:ODID:2:0:1:0]}
>>>>
>>>> Best,
>>>> Taewoo
>>>>
>>>
>>

Mime
View raw message