tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bikas Saha <bi...@hortonworks.com>
Subject RE: ClassNotFoundException with custom InputFormat.
Date Wed, 17 Jun 2015 23:58:58 GMT
If I understand this right, there is a jar with user code in it. The jar needs to be available
during split creation but it is not available.

Is split creation happening on the client or on the AM. If its happening on the AM, and the
AM is not getting the jars then how are you specifying the jars to be sent to the AM. There
are different ways to do it.

1)      Set tez.aux.uris in tez-site.xml to an HDFS location and copy user jars there

2)      Upload the user jar to HDFS and create a YARN local resource for it. Then use either
of the following to add the local resource to the AM/DAG that needs it.

a.       TezClient#addAppMasterLocalFiles(…)

b.      DAG#addTaskLocalFiles(…)

Not sure what is meant by classic Hadoop style jars?


From: Chris K Wensel [mailto:chris@wensel.net]
Sent: Wednesday, June 17, 2015 4:41 PM
To: dev@tez.apache.org
Cc: user@tez.apache.org
Subject: Re: ClassNotFoundException with custom InputFormat.

cross posting down to dev… should continue the discussion there I believe.

as I understand it, all Cascading users familiar with packaging a Hadoop job jar with a lib
folder, in which the packaged custom InputFormat is placed — pulled from maven etc, will
have this issue.

this also expands to projects on top of Cascading including Scalding and Cascalog.

oddly the org.apache.tez.client.AMConfiguration has a

private Map<String, String> env;

but is unused.

On Jun 17, 2015, at 4:32 PM, Andre Kelpe <akelpe@concurrentinc.com<mailto:akelpe@concurrentinc.com>>

we are currently running into a problem when a user of Cascading uses a custom InputFormat
with Tez. The ApplicationMaster is running into a ClassNotFoundException when calculating
the splits, since we are unable to control the environment/classpath visibile to the ApplicationMaster.
We have a work-around, where the users have to supply a fat-jar to make it work, but we need
to be able to support other ways as well.

When interacting with the DAG, we are able to pass along a custom environment/classpath, but
that API is missing on the TezClient, causing the AppMaster to fail, when the user is using
classic hadoop style jars (embedded lib directory).

In order to get lingual, our SQL layer on top of Cascading to work correctly, we need a way
to supply the environment in a more dynamic way then one fatjar, so it would be great if the
API could be extendend to do that.
I have opened https://issues.apache.org/jira/browse/TEZ-2563

- André

André Kelpe

Chris K Wensel

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message