samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <criccom...@apache.org>
Subject Re: Samza on Yarn
Date Fri, 13 Mar 2015 00:19:30 GMT
Hey Shekar,

Yes, this is definitely a classpath issue. The pastebin you sent does not
include any of the samza-core/samza-yarn/scala JARs. This is rather
strange, since you said you put the JARs in this path:

  /home/hadoop/hadoop-2.5.2/share/hadoop/hdfs/lib/

And I do see *other* JARs listed with this path. Are you sure you put the
Samza JARs on *all* machines, not just the RM machine? According to the
yarn-default.xml logs, it says:

CLASSPATH for YARN applications. A comma-separated list of CLASSPATH
entries. When this value is empty, the following default CLASSPATH for YARN
applications would be used. For Linux: $HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/share/hadoop/common/*,
$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*

So, it seems like it should pick up the JARs, if they're in the NM's
directory.

The exception that you're now seeing seems to suggest that one of the Samza
containers is failing:

Container for appattempt_1426204312971_0001_000002 exited with exitCode: 1

The _000002 suffix indicates a non-AM failure (i.e. the Samza container
failed, not the Samza AM). Can you check the AM logs, and find the http://...
link to the container logs? It should give a hint about why the container
failed.

Cheers,
Chris

On Thu, Mar 12, 2015 at 4:58 PM, Shekar Tippur <ctippur@gmail.com> wrote:

> Chris,
>
> Made some progress.
>
> By adding yarn.application.classpath to yarn-site.xml, I am no longer
> getting class not found error. However, I am getting a different error:
>
> Application application_1426204312971_0001 failed 2 times due to AM
> Container for appattempt_1426204312971_0001_000002 exited with exitCode: 1
> due to: Exception from container-launch: ExitCodeException exitCode=1:
> ExitCodeException exitCode=1:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
> at org.apache.hadoop.util.Shell.run(Shell.java:455)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Container exited with a non-zero exit code 1
> .Failing this attempt.. Failing the application.
>
> Looks like a common issue with yarn but not sure how to resolve as yet.
>
>
> - Shekar
>
> On Thu, Mar 12, 2015 at 1:44 PM, Shekar Tippur <ctippur@gmail.com> wrote:
>
> > Chris - Here it is.
> >
> > http://pastebin.com/c3e21Hzf
> >
> > - Shekar
> >
> > On Thu, Mar 12, 2015 at 10:58 AM, Chris Riccomini <criccomini@apache.org
> >
> > wrote:
> >
> >> This is the line that I'm interested in:
> >>
> >> STARTUP_MSG:   classpath ....
> >>
> >> On Thu, Mar 12, 2015 at 10:55 AM, Chris Riccomini <
> criccomini@apache.org>
> >> wrote:
> >>
> >> > Hey Shekar,
> >> >
> >> > Could you paste the full log on pastebin? It really seems like
> >> something's
> >> > missing from the classpath. If samza-yarn is there, it should be able
> to
> >> > see that file. I think the full log has a dump of the classpath. If it
> >> > doesn't, could you paste the line where the YARN NM is starting up,
> and
> >> > dumps the full classpath?
> >> >
> >> > Cheers,
> >> > Chris
> >> >
> >> > On Thu, Mar 12, 2015 at 10:17 AM, Shekar Tippur <ctippur@gmail.com>
> >> wrote:
> >> >
> >> >> I think all these jars are in place (Under
> >> >> $HADOOP_YARN_HOME/share/hadoop/hdfs/lib)
> >> >>
> >> >> - Shekar
> >> >>
> >> >> On Thu, Mar 12, 2015 at 9:36 AM, Chris Riccomini <
> >> criccomini@apache.org>
> >> >> wrote:
> >> >>
> >> >> > Hey Shekar,
> >> >> >
> >> >> > You need that samza-yarn file on your RM/NM's classpath, along
with
> >> >> scala.
> >> >> > We missed this in the docs, and are tracking the issue here:
> >> >> >
> >> >> >   https://issues.apache.org/jira/browse/SAMZA-456
> >> >> >
> >> >> > You'll also need samza-core in the classpath, based on the
> >> discussion on
> >> >> > SAMZA-456. Sorry about that. If you want to update the tutorial
> when
> >> you
> >> >> > get your cluster working, and submit a patch, that'd be great!
:)
> >> >> >
> >> >> > Cheers,
> >> >> > Chris
> >> >> >
> >> >> > On Wed, Mar 11, 2015 at 9:43 PM, Shekar Tippur <ctippur@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> > > Here is the corresponding log:
> >> >> > >
> >> >> > > 2015-03-11 20:43:09,665 INFO  [AsyncDispatcher event handler]
> >> >> > > localizer.LocalizedResource (LocalizedResource.java:handle(203))
> -
> >> >> > Resource
> >> >> > > http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz
> >> transitioned
> >> >> from
> >> >> > > INIT to DOWNLOADING
> >> >> > >
> >> >> > > 2015-03-11 20:43:09,665 INFO  [AsyncDispatcher event handler]
> >> >> > > localizer.ResourceLocalizationService
> >> >> > > (ResourceLocalizationService.java:handle(679)) - Created
> localizer
> >> for
> >> >> > > container_1426121400423_2587_01_000001
> >> >> > >
> >> >> > > 2015-03-11 20:43:09,669 INFO  [LocalizerRunner for
> >> >> > > container_1426121400423_2587_01_000001]
> >> >> > > localizer.ResourceLocalizationService
> >> >> > > (ResourceLocalizationService.java:writeCredentials(1107))
-
> Writing
> >> >> > > credentials to the nmPrivate file
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> /tmp/hadoop-hadoop/nm-local-dir/nmPrivate/container_1426121400423_2587_01_000001.tokens.
> >> >> > > Credentials list:
> >> >> > >
> >> >> > > 2015-03-11 20:43:09,675 INFO  [DeletionService #0]
> >> >> > > nodemanager.DefaultContainerExecutor
> >> >> > > (DefaultContainerExecutor.java:deleteAsUser(378)) - Deleting
> path :
> >> >> > >
> >> /home/hadoop/hadoop-2.5.2/logs/userlogs/application_1426120927668_0010
> >> >> > >
> >> >> > > 2015-03-11 20:43:09,676 INFO  [LocalizerRunner for
> >> >> > > container_1426121400423_2587_01_000001]
> >> >> > > nodemanager.DefaultContainerExecutor
> >> >> > > (DefaultContainerExecutor.java:createUserCacheDirs(469))
-
> >> >> Initializing
> >> >> > > user root
> >> >> > >
> >> >> > > 2015-03-11 20:43:09,685 INFO  [LocalizerRunner for
> >> >> > > container_1426121400423_2587_01_000001]
> >> >> > > nodemanager.DefaultContainerExecutor
> >> >> > > (DefaultContainerExecutor.java:startLocalizer(103)) - Copying
> from
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> /tmp/hadoop-hadoop/nm-local-dir/nmPrivate/container_1426121400423_2587_01_000001.tokens
> >> >> > > to
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> /tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587/container_1426121400423_2587_01_000001.tokens
> >> >> > >
> >> >> > > *2015-03-11 20:43:09,685 INFO  [LocalizerRunner for
> >> >> > > container_1426121400423_2587_01_000001]
> >> >> > > nodemanager.DefaultContainerExecutor
> >> >> > > (DefaultContainerExecutor.java:startLocalizer(105)) - CWD
set to
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> /tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587
> >> >> > > =
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> file:/tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587*
> >> >> > >
> >> >> > > *2015-03-11 20:43:09,716 INFO  [IPC Server handler 2 on 8040]
> >> >> > > localizer.ResourceLocalizationService
> >> >> > > (ResourceLocalizationService.java:update(1007)) - DEBUG:
FAILED {
> >> >> > > http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz
> >> >> > > <http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz>,
0,
> >> ARCHIVE,
> >> >> > null
> >> >> > > }, java.lang.ClassNotFoundException: Class
> >> >> > > org.apache.samza.util.hadoop.HttpFileSystem not found*
> >> >> > >
> >> >> > > *2015-03-11 20:43:09,716 INFO  [IPC Server handler 2 on 8040]
> >> >> > > localizer.LocalizedResource (LocalizedResource.java:handle(203))
> -
> >> >> > Resource
> >> >> > > http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz(-
> >> >> > > <http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz(-
> >> >> > >
> >> >> >
> >> >>
> >>
> >>/tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587/filecache/10/hello-samza-0.8.0-dist.tar.gz)
> >> >> > > transitioned from DOWNLOADING to FAILED*
> >> >> > >
> >> >> > > 2015-03-11 20:43:09,717 INFO  [AsyncDispatcher event handler]
> >> >> > > container.Container (ContainerImpl.java:handle(918)) - Container
> >> >> > > container_1426121400423_2587_01_000001 transitioned from
> >> LOCALIZING to
> >> >> > > LOCALIZATION_FAILED
> >> >> > >
> >> >> > > 2015-03-11 20:43:09,717 INFO  [AsyncDispatcher event handler]
> >> >> > > localizer.LocalResourcesTrackerImpl
> >> >> > > (LocalResourcesTrackerImpl.java:handle(151)) - Container
> >> >> > > container_1426121400423_2587_01_000001 sent RELEASE event
on a
> >> >> resource
> >> >> > > request { http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz
> ,
> >> 0,
> >> >> > > ARCHIVE, null } not present in cache.
> >> >> > >
> >> >> > > 2015-03-11 20:43:09,717 WARN  [AsyncDispatcher event handler]
> >> >> > > nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150))
-
> >> >> > > USER=root OPERATION=Container
> >> >> > > Finished - Failed TARGET=ContainerImpl RESULT=FAILURE
> >> >> > DESCRIPTION=Container
> >> >> > > failed with state: LOCALIZATION_FAILED
> >> >> > APPID=application_1426121400423_2587
> >> >> > > CONTAINERID=container_1426121400423_2587_01_000001
> >> >> > >
> >> >> > > 2015-03-11 20:43:09,717 INFO  [AsyncDispatcher event handler]
> >> >> > > container.Container (ContainerImpl.java:handle(918)) - Container
> >> >> > > container_1426121400423_2587_01_000001 transitioned from
> >> >> > > LOCALIZATION_FAILED to DONE
> >> >> > >
> >> >> > > 2015-03-11 20:43:09,717 INFO  [AsyncDispatcher event handler]
> >> >> > > application.Application (ApplicationImpl.java:transition(340))
-
> >> >> Removing
> >> >> > > container_1426121400423_2587_01_000001 from application
> >> >> > > application_1426121400423_2587
> >> >> > >
> >> >> > > 2015-03-11 20:43:09,717 INFO  [AsyncDispatcher event handler]
> >> >> > > containermanager.AuxServices (AuxServices.java:handle(196))
- Got
> >> >> event
> >> >> > > CONTAINER_STOP for appId application_1426121400423_2587
> >> >> > >
> >> >> > > 2015-03-11 20:43:09,717 INFO  [DeletionService #2]
> >> >> > > nodemanager.DefaultContainerExecutor
> >> >> > > (DefaultContainerExecutor.java:deleteAsUser(369)) - Deleting
> >> absolute
> >> >> > path
> >> >> > > :
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> /tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587/container_1426121400423_2587_01_000001
> >> >> > >
> >> >> > > 2015-03-11 20:43:09,717 WARN  [DeletionService #2]
> >> >> > > nodemanager.DefaultContainerExecutor
> >> >> > > (DefaultContainerExecutor.java:deleteAsUser(372)) - delete
> returned
> >> >> false
> >> >> > > for path:
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> [/tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587/container_1426121400423_2587_01_000001]
> >> >> > >
> >> >> > > 2015-03-11 20:43:09,718 WARN  [LocalizerRunner for
> >> >> > > container_1426121400423_2587_01_000001] ipc.Client
> >> >> > (Client.java:call(1389))
> >> >> > > - interrupted waiting to send rpc request to server
> >> >> > >
> >> >> > > java.lang.InterruptedException
> >> >> > >
> >> >> > > at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400)
> >> >> > >
> >> >> > > at java.util.concurrent.FutureTask.get(FutureTask.java:187)
> >> >> > >
> >> >> > > at
> >> >> >
> >> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1030)
> >> >> > >
> >> >> > > at org.apache.hadoop.ipc.Client.call(Client.java:1384)
> >> >> > >
> >> >> > > at org.apache.hadoop.ipc.Client.call(Client.java:1364)
> >> >> > >
> >> >> > > at
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> >> >> > >
> >> >> > > at com.sun.proxy.$Proxy29.heartbeat(Unknown Source)
> >> >> > >
> >> >> > > at
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
> >> >> > >
> >> >> > > at
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:255)
> >> >> > >
> >> >> > > at
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> >> >> > >
> >> >> > > at
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:107)
> >> >> > >
> >> >> > > at
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1073)
> >> >> > >
> >> >> > > java.io.IOException: java.lang.InterruptedException
> >> >> > >
> >> >> > > at org.apache.hadoop.ipc.Client.call(Client.java:1390)
> >> >> > >
> >> >> > > at org.apache.hadoop.ipc.Client.call(Client.java:1364)
> >> >> > >
> >> >> > > at
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> >> >> > >
> >> >> > > at com.sun.proxy.$Proxy29.heartbeat(Unknown Source)
> >> >> > >
> >> >> > > at
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
> >> >> > >
> >> >> > > at
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:255)
> >> >> > >
> >> >> > > at
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> >> >> > >
> >> >> > > at
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:107)
> >> >> > >
> >> >> > > at
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1073)
> >> >> > >
> >> >> > > Caused by: java.lang.InterruptedException
> >> >> > >
> >> >> > > at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400)
> >> >> > >
> >> >> > > at java.util.concurrent.FutureTask.get(FutureTask.java:187)
> >> >> > >
> >> >> > > at
> >> >> >
> >> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1030)
> >> >> > >
> >> >> > > at org.apache.hadoop.ipc.Client.call(Client.java:1384)
> >> >> > >
> >> >> > > ... 8 more
> >> >> > >
> >> >> > > On Wed, Mar 11, 2015 at 4:56 PM, Shekar Tippur <
> ctippur@gmail.com>
> >> >> > wrote:
> >> >> > >
> >> >> > > > Hello,
> >> >> > > >
> >> >> > > > Sorry to reopen this topic. I had setup yarn couple
of months
> ago
> >> >> and
> >> >> > > cant
> >> >> > > > seem to replicate this now.
> >> >> > > >
> >> >> > > > I see that I have done everything listed here
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >>
> http://samza.apache.org/learn/tutorials/0.7.0/run-in-multi-node-yarn.html
> >> >> > > >
> >> >> > > > I see this error on the application side
> >> >> > > >
> >> >> > > > Application application_1426115467623_0492 failed 2
times due
> to
> >> AM
> >> >> > > > Container for appattempt_1426115467623_0492_000002 exited
with
> >> >> > exitCode:
> >> >> > > > -1000 due to: java.lang.ClassNotFoundException: Class
> >> >> > > > org.apache.samza.util.hadoop.HttpFileSystem not found
> >> >> > > > .Failing this attempt.. Failing the application.
> >> >> > > >
> >> >> > > > I see that
> >> >> > > >
> >> >> > > >
> >> >> >
> >> >>
> >>
> /home/hadoop/hadoop-2.5.2/share/hadoop/hdfs/lib/samza-yarn_2.10-0.8.0.jar
> >> >> > > > has that particular class
> >> >> > > >
> >> >> > > >   1739 Tue Nov 25 10:51:40 PST 2014
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >>
> org/apache/samza/util/hadoop/HttpFileSystem$$anonfun$getFileStatus$1.class
> >> >> > > >
> >> >> > > >   1570 Tue Nov 25 10:51:40 PST 2014
> >> >> > > >
> >> >>
> org/apache/samza/util/hadoop/HttpFileSystem$$anonfun$initialize$1.class
> >> >> > > >
> >> >> > > >   1597 Tue Nov 25 10:51:40 PST 2014
> >> >> > > >
> org/apache/samza/util/hadoop/HttpFileSystem$$anonfun$open$1.class
> >> >> > > >
> >> >> > > >   1797 Tue Nov 25 10:51:40 PST 2014
> >> >> > > >
> org/apache/samza/util/hadoop/HttpFileSystem$$anonfun$open$2.class
> >> >> > > >
> >> >> > > >   9549 Tue Nov 25 10:51:40 PST 2014
> >> >> > > > org/apache/samza/util/hadoop/HttpFileSystem.class
> >> >> > > >
> >> >> > > >
> >> >> > > > I see that env is set right:
> >> >> > > >
> >> >> > > >
> >> >> > > > HADOOP_YARN_HOME=/home/hadoop/hadoop-2.5.2
> >> >> > > >
> >> >> > > > HADOOP_CONF_DIR=/home/hadoop/hadoop-2.5.2/conf
> >> >> > > >
> >> >> > > > Wondering if I am missing anything...
> >> >> > > > - Shekar
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message