spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From qingyang li <liqingyang1...@gmail.com>
Subject Re: task always lost
Date Wed, 02 Jul 2014 09:44:25 GMT
executor always been removed.

someone encountered same issue
https://groups.google.com/forum/#!topic/spark-users/-mYn6BF-Y5Y

-------------
14/07/02 17:41:16 INFO storage.BlockManagerMasterActor: Trying to remove
executor 20140616-104524-1694607552-5050-26919-1 from BlockManagerMaster.
14/07/02 17:41:16 INFO storage.BlockManagerMaster: Removed
20140616-104524-1694607552-5050-26919-1 successfully in removeExecutor
14/07/02 17:41:16 DEBUG spark.MapOutputTrackerMaster: Increasing epoch to 10
14/07/02 17:41:16 INFO scheduler.DAGScheduler: Host gained which was in
lost list earlier: bigdata001
14/07/02 17:41:16 DEBUG scheduler.TaskSchedulerImpl: parentName: , name:
TaskSet_0, runningTasks: 0
14/07/02 17:41:16 DEBUG scheduler.TaskSchedulerImpl: parentName: , name:
TaskSet_0, runningTasks: 0
14/07/02 17:41:16 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID
12 on executor 20140616-143932-1694607552-5050-4080-3: bigdata004
(NODE_LOCAL)
14/07/02 17:41:16 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
10785 bytes in 1 ms
14/07/02 17:41:16 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID
13 on executor 20140616-104524-1694607552-5050-26919-3: bigdata002
(NODE_LOCAL


2014-07-02 12:01 GMT+08:00 qingyang li <liqingyang1985@gmail.com>:

> also this one in warning log:
>
> E0702 11:35:08.869998 17840 slave.cpp:2310] Container
> 'af557235-2d5f-4062-aaf3-a747cb3cd0d1' for executor
> '20140616-104524-1694607552-5050-26919-1' of framework
> '20140702-113428-1694607552-5050-17766-0000' failed to start: Failed to
> fetch URIs for container 'af557235-2d5f-4062-aaf3-a747cb3cd0d1': exit
> status 32512
>
>
> 2014-07-02 11:46 GMT+08:00 qingyang li <liqingyang1985@gmail.com>:
>
> Here is the log:
>>
>> E0702 10:32:07.599364 14915 slave.cpp:2686] Failed to unmonitor container
>> for executor 20140616-104524-1694607552-5050-26919-1 of framework
>> 20140702-102939-1694607552-5050-14846-0000: Not monitored
>>
>>
>> 2014-07-02 1:45 GMT+08:00 Aaron Davidson <ilikerps@gmail.com>:
>>
>> Can you post the logs from any of the dying executors?
>>>
>>>
>>> On Tue, Jul 1, 2014 at 1:25 AM, qingyang li <liqingyang1985@gmail.com>
>>> wrote:
>>>
>>> > i am using mesos0.19 and spark0.9.0 ,  the mesos cluster is started,
>>> when I
>>> > using spark-shell to submit one job, the tasks always lost.  here is
>>> the
>>> > log:
>>> > ----------
>>> > 14/07/01 16:24:27 INFO DAGScheduler: Host gained which was in lost list
>>> > earlier: bigdata005
>>> > 14/07/01 16:24:27 INFO TaskSetManager: Starting task 0.0:1 as TID 4042
>>> on
>>> > executor 20140616-143932-1694607552-5050-4080-2: bigdata005
>>> (PROCESS_LOCAL)
>>> > 14/07/01 16:24:27 INFO TaskSetManager: Serialized task 0.0:1 as 1570
>>> bytes
>>> > in 0 ms
>>> > 14/07/01 16:24:28 INFO TaskSetManager: Re-queueing tasks for
>>> > 20140616-104524-1694607552-5050-26919-1 from TaskSet 0.0
>>> > 14/07/01 16:24:28 WARN TaskSetManager: Lost TID 4041 (task 0.0:0)
>>> > 14/07/01 16:24:28 INFO DAGScheduler: Executor lost:
>>> > 20140616-104524-1694607552-5050-26919-1 (epoch 3427)
>>> > 14/07/01 16:24:28 INFO BlockManagerMasterActor: Trying to remove
>>> executor
>>> > 20140616-104524-1694607552-5050-26919-1 from BlockManagerMaster.
>>> > 14/07/01 16:24:28 INFO BlockManagerMaster: Removed
>>> > 20140616-104524-1694607552-5050-26919-1 successfully in removeExecutor
>>> > 14/07/01 16:24:28 INFO TaskSetManager: Re-queueing tasks for
>>> > 20140616-143932-1694607552-5050-4080-2 from TaskSet 0.0
>>> > 14/07/01 16:24:28 WARN TaskSetManager: Lost TID 4042 (task 0.0:1)
>>> > 14/07/01 16:24:28 INFO DAGScheduler: Executor lost:
>>> > 20140616-143932-1694607552-5050-4080-2 (epoch 3428)
>>> > 14/07/01 16:24:28 INFO BlockManagerMasterActor: Trying to remove
>>> executor
>>> > 20140616-143932-1694607552-5050-4080-2 from BlockManagerMaster.
>>> > 14/07/01 16:24:28 INFO BlockManagerMaster: Removed
>>> > 20140616-143932-1694607552-5050-4080-2 successfully in removeExecutor
>>> > 14/07/01 16:24:28 INFO DAGScheduler: Host gained which was in lost list
>>> > earlier: bigdata005
>>> > 14/07/01 16:24:28 INFO DAGScheduler: Host gained which was in lost list
>>> > earlier: bigdata001
>>> > 14/07/01 16:24:28 INFO TaskSetManager: Starting task 0.0:1 as TID 4043
>>> on
>>> > executor 20140616-143932-1694607552-5050-4080-2: bigdata005
>>> (PROCESS_LOCAL)
>>> > 14/07/01 16:24:28 INFO TaskSetManager: Serialized task 0.0:1 as 1570
>>> bytes
>>> > in 0 ms
>>> > 14/07/01 16:24:28 INFO TaskSetManager: Starting task 0.0:0 as TID 4044
>>> on
>>> > executor 20140616-104524-1694607552-5050-26919-1: bigdata001
>>> > (PROCESS_LOCAL)
>>> > 14/07/01 16:24:28 INFO TaskSetManager: Serialized task 0.0:0 as 1570
>>> bytes
>>> > in 0 ms
>>> >
>>> >
>>> > it seems other guy has also encountered such problem,
>>> >
>>> >
>>> http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201305.mbox/%3C201305161047069952830@nfs.iscas.ac.cn%3E
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message