spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saiph Kappa <saiph.ka...@gmail.com>
Subject Re: Unable to use more than 1 executor for spark streaming application with YARN
Date Wed, 17 Jun 2015 16:47:45 GMT
How can I get more information regarding this exception?

On Wed, Jun 17, 2015 at 1:17 AM, Saiph Kappa <saiph.kappa@gmail.com> wrote:

> Hi,
>
> I am running a simple spark streaming application on hadoop 2.7.0/YARN
> (master: yarn-client) with 2 executors in different machines. However,
> while the app is running, I can see on the app web UI (tab executors) that
> only 1 executor keeps completing tasks over time, the other executor only
> works and completes tasks for some seconds. From the logs I can see an
> exception arising, though it is not clear what went wrong.
>
> Here is the yarn-nodemanager log:
> «
> 2015-06-17 00:29:50,967 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Starting resource-monitoring for container_1434391147618_0007_01_000003
> 2015-06-17 00:29:50,977 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Memory usage of ProcessTree 30553 for container-id
> container_1434391147618_0007_01_000003: 286.5 MB of 3 GB physical memory
> used; 2.7 GB of 6.3 GB virtual memory used
> 2015-06-17 00:29:53,991 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Memory usage of ProcessTree 30553 for container-id
> container_1434391147618_0007_01_000003: 463.7 MB of 3 GB physical memory
> used; 2.7 GB of 6.3 GB virtual memory used
> 2015-06-17 00:29:57,009 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Memory usage of ProcessTree 30553 for container-id
> container_1434391147618_0007_01_000003: 465.7 MB of 3 GB physical memory
> used; 2.7 GB of 6.3 GB virtual memory used
> 2015-06-17 00:30:00,024 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Memory usage of ProcessTree 30553 for container-id
> container_1434391147618_0007_01_000003: 467.6 MB of 3 GB physical memory
> used; 2.7 GB of 6.3 GB virtual memory used
> 2015-06-17 00:30:03,032 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Memory usage of ProcessTree 30553 for container-id
> container_1434391147618_0007_01_000003: 474.0 MB of 3 GB physical memory
> used; 2.7 GB of 6.3 GB virtual memory used
> 2015-06-17 00:30:06,041 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Memory usage of ProcessTree 30553 for container-id
> container_1434391147618_0007_01_000003: 480.2 MB of 3 GB physical memory
> used; 2.7 GB of 6.3 GB virtual memory used
> 2015-06-17 00:30:09,053 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Memory usage of ProcessTree 30553 for container-id
> container_1434391147618_0007_01_000003: 540.9 MB of 3 GB physical memory
> used; 2.7 GB of 6.3 GB virtual memory used
> 2015-06-17 00:30:12,068 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Memory usage of ProcessTree 30553 for container-id
> container_1434391147618_0007_01_000003: 550.9 MB of 3 GB physical memory
> used; 2.7 GB of 6.3 GB virtual memory used
> 2015-06-17 00:30:15,075 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Memory usage of ProcessTree 30553 for container-id
> container_1434391147618_0007_01_000003: 551.1 MB of 3 GB physical memory
> used; 2.7 GB of 6.3 GB virtual memory used
> 2015-06-17 00:30:18,090 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Memory usage of ProcessTree 30553 for container-id
> container_1434391147618_0007_01_000003: 558.7 MB of 3 GB physical memory
> used; 2.7 GB of 6.3 GB virtual memory used
> 2015-06-17 00:30:20,157 WARN
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
> code from container container_1434391147618_0007_01_000003 is : 1
> 2015-06-17 00:30:20,157 WARN
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Exception from container-launch with container ID:
> container_1434391147618_0007_01_000003 and exit code: 1
> ExitCodeException exitCode=1:
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
>         at org.apache.hadoop.util.Shell.run(Shell.java:456)
>         at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
>         at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-06-17 00:30:20,157 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from
> container-launch.
> 2015-06-17 00:30:20,157 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id:
> container_1434391147618_0007_01_000003
> 2015-06-17 00:30:20,157 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 1
> 2015-06-17 00:30:20,157 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace:
> ExitCodeException exitCode=1:
> 2015-06-17 00:30:20,157 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
> org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
> 2015-06-17 00:30:20,157 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
> org.apache.hadoop.util.Shell.run(Shell.java:456)
> 2015-06-17 00:30:20,157 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
> 2015-06-17 00:30:20,158 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
> 2015-06-17 00:30:20,158 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
> 2015-06-17 00:30:20,158 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
> 2015-06-17 00:30:20,158 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
> java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 2015-06-17 00:30:20,158 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 2015-06-17 00:30:20,158 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 2015-06-17 00:30:20,158 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
> java.lang.Thread.run(Thread.java:745)
> 2015-06-17 00:30:20,158 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Container exited with a non-zero exit code 1
> 2015-06-17 00:30:20,158 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
> Container container_1434391147618_0007_01_000003 transitioned from RUNNING
> to EXITED_WITH_FAILURE
> 2015-06-17 00:30:20,158 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Cleaning up container container_1434391147618_0007_01_000003
> 2015-06-17 00:30:20,178 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Deleting absolute path :
> /tmp/hadoop-myuser/nm-local-dir/usercache/myuser/appcache/application_1434391147618_0007/container_1434391147618_0007_01_000003
> 2015-06-17 00:30:20,178 WARN
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=myuser
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl
> RESULT=FAILURE  DESCRIPTION=Container failed with state:
> EXITED_WITH_FAILURE    APPID=application_1434391147618_0007
> CONTAINERID=container_1434391147618_0007_01_000003
> 2015-06-17 00:30:20,178 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
> Container container_1434391147618_0007_01_000003 transitioned from
> EXITED_WITH_FAILURE to DONE
> 2015-06-17 00:30:20,179 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
> Removing container_1434391147618_0007_01_000003 from application
> application_1434391147618_0007
> 2015-06-17 00:30:20,179 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
> event CONTAINER_STOP for appId application_1434391147618_0007
> 2015-06-17 00:30:20,500 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
> Application application_1434391147618_0007 transitioned from RUNNING to
> APPLICATION_RESOURCES_CLEANINGUP
> 2015-06-17 00:30:20,501 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Deleting absolute path :
> /tmp/hadoop-myuser/nm-local-dir/usercache/myuser/appcache/application_1434391147618_0007
> 2015-06-17 00:30:20,501 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
> event APPLICATION_STOP for appId application_1434391147618_0007
> 2015-06-17 00:30:20,501 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
> Application application_1434391147618_0007 transitioned from
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2015-06-17 00:30:20,501 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler:
> Scheduling Log Deletion for application: application_1434391147618_0007,
> with delay of 10800 seconds
> »
>
> Not sure if it is relevant, but in the output of the application I keep
> getting this message:
> «15/06/17 00:29:53 INFO ShuffledDStream: Time 1434497393000 ms is invalid
> as zeroTime is 1434497391000 ms and slideDuration is 4000 ms and difference
> is 2000 ms»
>
> I'm using spark 1.3.2.
>
> Any ideas of what can be happening?
>
> Thanks.
>
>

Mime
View raw message