spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From qingyang li <liqingyang1...@gmail.com>
Subject Re: task always lost
Date Wed, 02 Jul 2014 03:46:33 GMT
Here is the log:

E0702 10:32:07.599364 14915 slave.cpp:2686] Failed to unmonitor container
for executor 20140616-104524-1694607552-5050-26919-1 of framework
20140702-102939-1694607552-5050-14846-0000: Not monitored


2014-07-02 1:45 GMT+08:00 Aaron Davidson <ilikerps@gmail.com>:

> Can you post the logs from any of the dying executors?
>
>
> On Tue, Jul 1, 2014 at 1:25 AM, qingyang li <liqingyang1985@gmail.com>
> wrote:
>
> > i am using mesos0.19 and spark0.9.0 ,  the mesos cluster is started,
> when I
> > using spark-shell to submit one job, the tasks always lost.  here is the
> > log:
> > ----------
> > 14/07/01 16:24:27 INFO DAGScheduler: Host gained which was in lost list
> > earlier: bigdata005
> > 14/07/01 16:24:27 INFO TaskSetManager: Starting task 0.0:1 as TID 4042 on
> > executor 20140616-143932-1694607552-5050-4080-2: bigdata005
> (PROCESS_LOCAL)
> > 14/07/01 16:24:27 INFO TaskSetManager: Serialized task 0.0:1 as 1570
> bytes
> > in 0 ms
> > 14/07/01 16:24:28 INFO TaskSetManager: Re-queueing tasks for
> > 20140616-104524-1694607552-5050-26919-1 from TaskSet 0.0
> > 14/07/01 16:24:28 WARN TaskSetManager: Lost TID 4041 (task 0.0:0)
> > 14/07/01 16:24:28 INFO DAGScheduler: Executor lost:
> > 20140616-104524-1694607552-5050-26919-1 (epoch 3427)
> > 14/07/01 16:24:28 INFO BlockManagerMasterActor: Trying to remove executor
> > 20140616-104524-1694607552-5050-26919-1 from BlockManagerMaster.
> > 14/07/01 16:24:28 INFO BlockManagerMaster: Removed
> > 20140616-104524-1694607552-5050-26919-1 successfully in removeExecutor
> > 14/07/01 16:24:28 INFO TaskSetManager: Re-queueing tasks for
> > 20140616-143932-1694607552-5050-4080-2 from TaskSet 0.0
> > 14/07/01 16:24:28 WARN TaskSetManager: Lost TID 4042 (task 0.0:1)
> > 14/07/01 16:24:28 INFO DAGScheduler: Executor lost:
> > 20140616-143932-1694607552-5050-4080-2 (epoch 3428)
> > 14/07/01 16:24:28 INFO BlockManagerMasterActor: Trying to remove executor
> > 20140616-143932-1694607552-5050-4080-2 from BlockManagerMaster.
> > 14/07/01 16:24:28 INFO BlockManagerMaster: Removed
> > 20140616-143932-1694607552-5050-4080-2 successfully in removeExecutor
> > 14/07/01 16:24:28 INFO DAGScheduler: Host gained which was in lost list
> > earlier: bigdata005
> > 14/07/01 16:24:28 INFO DAGScheduler: Host gained which was in lost list
> > earlier: bigdata001
> > 14/07/01 16:24:28 INFO TaskSetManager: Starting task 0.0:1 as TID 4043 on
> > executor 20140616-143932-1694607552-5050-4080-2: bigdata005
> (PROCESS_LOCAL)
> > 14/07/01 16:24:28 INFO TaskSetManager: Serialized task 0.0:1 as 1570
> bytes
> > in 0 ms
> > 14/07/01 16:24:28 INFO TaskSetManager: Starting task 0.0:0 as TID 4044 on
> > executor 20140616-104524-1694607552-5050-26919-1: bigdata001
> > (PROCESS_LOCAL)
> > 14/07/01 16:24:28 INFO TaskSetManager: Serialized task 0.0:0 as 1570
> bytes
> > in 0 ms
> >
> >
> > it seems other guy has also encountered such problem,
> >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201305.mbox/%3C201305161047069952830@nfs.iscas.ac.cn%3E
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message