tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre Kelpe <ake...@concurrentinc.com>
Subject Re: ClassNotFoundException with custom InputFormat.
Date Thu, 18 Jun 2015 20:06:35 GMT
Hi,

so I have tried ARCHIVE and added it to
TEZ_CLUSTER_ADDITIONAL_CLASSPATH_PREFIX as you suggested. That seems to get
me further. The problem is now, that the same jar should be used in the
containers for the Dags, but that seems to work in a completely different
way.

We were using PATTERN for those before + a custom environment:
https://github.com/Cascading/cascading/blob/3.0/cascading-hadoop2-tez/src/main/java/cascading/flow/tez/util/TezUtil.java#L276-L311
This works, however I don't want to add the same jar twice, once as an
archive and once as a PATTERN.

I am a bit lost why there are two different ways of doing this for the
various JVMs at various stages.

- André


On Thu, Jun 18, 2015 at 9:57 AM, Hitesh Shah <hitesh@apache.org> wrote:

> Hi Andre
>
> Are you using Local Resource type ARCHIVE? Using FILE may not help in your
> scenario.
>
> If you are using ARCHIVE, you can then use the classpath config (
> TEZ_CLUSTER_ADDITIONAL_CLASSPATH_PREFIX ) to modify the classpath.
>
>  For example, assume foo.jar and bar.jar ( in the structure that you
> called out ) are added to the map of local resources using keys foo and bar:
>       - classpath prefix would be
> “$PWD/foo/*:$PWD/foo/lib/*:$PWD/bar/*:$PWD/bar/lib/*:”
>
> As mentioned on the jira, the launch_container.sh from your cluster would
> help. Also, if you upload an example jar to the jira, I can help provide a
> working example.
>
> thanks
> — Hitesh
>
>
> On Jun 18, 2015, at 9:40 AM, Andre Kelpe <akelpe@concurrentinc.com> wrote:
>
> > On Wed, Jun 17, 2015 at 4:58 PM, Bikas Saha <bikas@hortonworks.com>
> wrote:
> >
> >> If I understand this right, there is a jar with user code in it. The jar
> >> needs to be available during split creation but it is not available.
> >>
> >>
> >>
> >> Is split creation happening on the client or on the AM. If its happening
> >> on the AM, and the AM is not getting the jars then how are you
> specifying
> >> the jars to be sent to the AM. There are different ways to do it.
> >>
> >
> > In our case the AM is doing the split calculation. We are sending the jar
> > over as LocalResources given in the TezClient#create method
> >
> >
> >> 1)      Set tez.aux.uris in tez-site.xml to an HDFS location and copy
> >> user jars there
> >>
> >> 2)      Upload the user jar to HDFS and create a YARN local resource for
> >> it. Then use either of the following to add the local resource to the
> >> AM/DAG that needs it.
> >>
> >> a.       TezClient#addAppMasterLocalFiles(…)
> >>
> >> b.      DAG#addTaskLocalFiles(…)
> >>
> >>
> >>
> >> Not sure what is meant by classic Hadoop style jars?
> >>
> >
> > Hadoop style jars are jar files, where you have the user code + all
> > required libs in a sub-directory within the jar. The layout that RunJar
> > understands since forever.
> >
> > The thing is that we can't find a way to put the jars in the lib folder
> in
> > the job-jar on the classpath of the AM.
> >
> > - André
> >
> >
> >
> >>
> >>
> >> Bikas
> >>
> >>
> >>
> >> *From:* Chris K Wensel [mailto:chris@wensel.net]
> >> *Sent:* Wednesday, June 17, 2015 4:41 PM
> >> *To:* dev@tez.apache.org
> >> *Cc:* user@tez.apache.org
> >> *Subject:* Re: ClassNotFoundException with custom InputFormat.
> >>
> >>
> >>
> >> cross posting down to dev… should continue the discussion there I
> believe.
> >>
> >>
> >>
> >> as I understand it, all Cascading users familiar with packaging a Hadoop
> >> job jar with a lib folder, in which the packaged custom InputFormat is
> >> placed — pulled from maven etc, will have this issue.
> >>
> >>
> >>
> >> this also expands to projects on top of Cascading including Scalding and
> >> Cascalog.
> >>
> >>
> >>
> >> oddly the org.apache.tez.client.AMConfiguration has a
> >>
> >>
> >>
> >> private Map<String, String> env;
> >>
> >>
> >>
> >> but is unused.
> >>
> >>
> >>
> >> On Jun 17, 2015, at 4:32 PM, Andre Kelpe <akelpe@concurrentinc.com>
> >> wrote:
> >>
> >>
> >>
> >> Hi,
> >>
> >> we are currently running into a problem when a user of Cascading uses a
> >> custom InputFormat with Tez. The ApplicationMaster is running into a
> >> ClassNotFoundException when calculating the splits, since we are unable
> to
> >> control the environment/classpath visibile to the ApplicationMaster. We
> >> have a work-around, where the users have to supply a fat-jar to make it
> >> work, but we need to be able to support other ways as well.
> >>
> >> When interacting with the DAG, we are able to pass along a custom
> >> environment/classpath, but that API is missing on the TezClient, causing
> >> the AppMaster to fail, when the user is using classic hadoop style jars
> >> (embedded lib directory).
> >>
> >> In order to get lingual, our SQL layer on top of Cascading to work
> >> correctly, we need a way to supply the environment in a more dynamic way
> >> then one fatjar, so it would be great if the API could be extendend to
> do
> >> that.
> >>
> >> I have opened https://issues.apache.org/jira/browse/TEZ-2563
> >>
> >> Thanks!
> >>
> >>
> >>
> >> - André
> >>
> >>
> >> --
> >>
> >> André Kelpe
> >> andre@concurrentinc.com
> >> http://concurrentinc.com
> >>
> >>
> >>
> >> —
> >>
> >> Chris K Wensel
> >>
> >> chris@wensel.net
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> >
> > --
> > André Kelpe
> > andre@concurrentinc.com
> > http://concurrentinc.com
>
>


-- 
André Kelpe
andre@concurrentinc.com
http://concurrentinc.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message