tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre Kelpe <ake...@concurrentinc.com>
Subject Re: ClassNotFoundException with custom InputFormat.
Date Thu, 18 Jun 2015 16:40:02 GMT
On Wed, Jun 17, 2015 at 4:58 PM, Bikas Saha <bikas@hortonworks.com> wrote:

>  If I understand this right, there is a jar with user code in it. The jar
> needs to be available during split creation but it is not available.
>
>
>
> Is split creation happening on the client or on the AM. If its happening
> on the AM, and the AM is not getting the jars then how are you specifying
> the jars to be sent to the AM. There are different ways to do it.
>

In our case the AM is doing the split calculation. We are sending the jar
over as LocalResources given in the TezClient#create method


>  1)      Set tez.aux.uris in tez-site.xml to an HDFS location and copy
> user jars there
>
> 2)      Upload the user jar to HDFS and create a YARN local resource for
> it. Then use either of the following to add the local resource to the
> AM/DAG that needs it.
>
> a.       TezClient#addAppMasterLocalFiles(…)
>
> b.      DAG#addTaskLocalFiles(…)
>
>
>
> Not sure what is meant by classic Hadoop style jars?
>

Hadoop style jars are jar files, where you have the user code + all
required libs in a sub-directory within the jar. The layout that RunJar
understands since forever.

The thing is that we can't find a way to put the jars in the lib folder in
the job-jar on the classpath of the AM.

- André



>
>
> Bikas
>
>
>
> *From:* Chris K Wensel [mailto:chris@wensel.net]
> *Sent:* Wednesday, June 17, 2015 4:41 PM
> *To:* dev@tez.apache.org
> *Cc:* user@tez.apache.org
> *Subject:* Re: ClassNotFoundException with custom InputFormat.
>
>
>
> cross posting down to dev… should continue the discussion there I believe.
>
>
>
> as I understand it, all Cascading users familiar with packaging a Hadoop
> job jar with a lib folder, in which the packaged custom InputFormat is
> placed — pulled from maven etc, will have this issue.
>
>
>
> this also expands to projects on top of Cascading including Scalding and
> Cascalog.
>
>
>
> oddly the org.apache.tez.client.AMConfiguration has a
>
>
>
> private Map<String, String> env;
>
>
>
> but is unused.
>
>
>
>  On Jun 17, 2015, at 4:32 PM, Andre Kelpe <akelpe@concurrentinc.com>
> wrote:
>
>
>
> Hi,
>
> we are currently running into a problem when a user of Cascading uses a
> custom InputFormat with Tez. The ApplicationMaster is running into a
> ClassNotFoundException when calculating the splits, since we are unable to
> control the environment/classpath visibile to the ApplicationMaster. We
> have a work-around, where the users have to supply a fat-jar to make it
> work, but we need to be able to support other ways as well.
>
> When interacting with the DAG, we are able to pass along a custom
> environment/classpath, but that API is missing on the TezClient, causing
> the AppMaster to fail, when the user is using classic hadoop style jars
> (embedded lib directory).
>
> In order to get lingual, our SQL layer on top of Cascading to work
> correctly, we need a way to supply the environment in a more dynamic way
> then one fatjar, so it would be great if the API could be extendend to do
> that.
>
> I have opened https://issues.apache.org/jira/browse/TEZ-2563
>
> Thanks!
>
>
>
> - André
>
>
> --
>
> André Kelpe
> andre@concurrentinc.com
> http://concurrentinc.com
>
>
>
> —
>
> Chris K Wensel
>
> chris@wensel.net
>
>
>
>
>
>
>



-- 
André Kelpe
andre@concurrentinc.com
http://concurrentinc.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message