tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siddharth Seth <ss...@apache.org>
Subject Re: ClassNotFoundException with custom InputFormat.
Date Thu, 18 Jun 2015 20:29:32 GMT
Tasks can setup local resources and change the environment (specifically
the classpath in this case). That's missing for AMs - where only
LocalResources can be specified.
An API to add a file to the classpath (including localization) - which
works for the AM and tasks would be useful, and there's a jira for this -
but hasn't been worked on yet.

On Thu, Jun 18, 2015 at 1:06 PM, Andre Kelpe <akelpe@concurrentinc.com>
wrote:

> Hi,
>
> so I have tried ARCHIVE and added it to
> TEZ_CLUSTER_ADDITIONAL_CLASSPATH_PREFIX as you suggested. That seems to get
> me further. The problem is now, that the same jar should be used in the
> containers for the Dags, but that seems to work in a completely different
> way.
>
> We were using PATTERN for those before + a custom environment:
>
> https://github.com/Cascading/cascading/blob/3.0/cascading-hadoop2-tez/src/main/java/cascading/flow/tez/util/TezUtil.java#L276-L311
> This works, however I don't want to add the same jar twice, once as an
> archive and once as a PATTERN.
>
> I am a bit lost why there are two different ways of doing this for the
> various JVMs at various stages.
>
> - André
>
>
> On Thu, Jun 18, 2015 at 9:57 AM, Hitesh Shah <hitesh@apache.org> wrote:
>
> > Hi Andre
> >
> > Are you using Local Resource type ARCHIVE? Using FILE may not help in
> your
> > scenario.
> >
> > If you are using ARCHIVE, you can then use the classpath config (
> > TEZ_CLUSTER_ADDITIONAL_CLASSPATH_PREFIX ) to modify the classpath.
> >
> >  For example, assume foo.jar and bar.jar ( in the structure that you
> > called out ) are added to the map of local resources using keys foo and
> bar:
> >       - classpath prefix would be
> > “$PWD/foo/*:$PWD/foo/lib/*:$PWD/bar/*:$PWD/bar/lib/*:”
> >
> > As mentioned on the jira, the launch_container.sh from your cluster would
> > help. Also, if you upload an example jar to the jira, I can help provide
> a
> > working example.
> >
> > thanks
> > — Hitesh
> >
> >
> > On Jun 18, 2015, at 9:40 AM, Andre Kelpe <akelpe@concurrentinc.com>
> wrote:
> >
> > > On Wed, Jun 17, 2015 at 4:58 PM, Bikas Saha <bikas@hortonworks.com>
> > wrote:
> > >
> > >> If I understand this right, there is a jar with user code in it. The
> jar
> > >> needs to be available during split creation but it is not available.
> > >>
> > >>
> > >>
> > >> Is split creation happening on the client or on the AM. If its
> happening
> > >> on the AM, and the AM is not getting the jars then how are you
> > specifying
> > >> the jars to be sent to the AM. There are different ways to do it.
> > >>
> > >
> > > In our case the AM is doing the split calculation. We are sending the
> jar
> > > over as LocalResources given in the TezClient#create method
> > >
> > >
> > >> 1)      Set tez.aux.uris in tez-site.xml to an HDFS location and copy
> > >> user jars there
> > >>
> > >> 2)      Upload the user jar to HDFS and create a YARN local resource
> for
> > >> it. Then use either of the following to add the local resource to the
> > >> AM/DAG that needs it.
> > >>
> > >> a.       TezClient#addAppMasterLocalFiles(…)
> > >>
> > >> b.      DAG#addTaskLocalFiles(…)
> > >>
> > >>
> > >>
> > >> Not sure what is meant by classic Hadoop style jars?
> > >>
> > >
> > > Hadoop style jars are jar files, where you have the user code + all
> > > required libs in a sub-directory within the jar. The layout that RunJar
> > > understands since forever.
> > >
> > > The thing is that we can't find a way to put the jars in the lib folder
> > in
> > > the job-jar on the classpath of the AM.
> > >
> > > - André
> > >
> > >
> > >
> > >>
> > >>
> > >> Bikas
> > >>
> > >>
> > >>
> > >> *From:* Chris K Wensel [mailto:chris@wensel.net]
> > >> *Sent:* Wednesday, June 17, 2015 4:41 PM
> > >> *To:* dev@tez.apache.org
> > >> *Cc:* user@tez.apache.org
> > >> *Subject:* Re: ClassNotFoundException with custom InputFormat.
> > >>
> > >>
> > >>
> > >> cross posting down to dev… should continue the discussion there I
> > believe.
> > >>
> > >>
> > >>
> > >> as I understand it, all Cascading users familiar with packaging a
> Hadoop
> > >> job jar with a lib folder, in which the packaged custom InputFormat is
> > >> placed — pulled from maven etc, will have this issue.
> > >>
> > >>
> > >>
> > >> this also expands to projects on top of Cascading including Scalding
> and
> > >> Cascalog.
> > >>
> > >>
> > >>
> > >> oddly the org.apache.tez.client.AMConfiguration has a
> > >>
> > >>
> > >>
> > >> private Map<String, String> env;
> > >>
> > >>
> > >>
> > >> but is unused.
> > >>
> > >>
> > >>
> > >> On Jun 17, 2015, at 4:32 PM, Andre Kelpe <akelpe@concurrentinc.com>
> > >> wrote:
> > >>
> > >>
> > >>
> > >> Hi,
> > >>
> > >> we are currently running into a problem when a user of Cascading uses
> a
> > >> custom InputFormat with Tez. The ApplicationMaster is running into a
> > >> ClassNotFoundException when calculating the splits, since we are
> unable
> > to
> > >> control the environment/classpath visibile to the ApplicationMaster.
> We
> > >> have a work-around, where the users have to supply a fat-jar to make
> it
> > >> work, but we need to be able to support other ways as well.
> > >>
> > >> When interacting with the DAG, we are able to pass along a custom
> > >> environment/classpath, but that API is missing on the TezClient,
> causing
> > >> the AppMaster to fail, when the user is using classic hadoop style
> jars
> > >> (embedded lib directory).
> > >>
> > >> In order to get lingual, our SQL layer on top of Cascading to work
> > >> correctly, we need a way to supply the environment in a more dynamic
> way
> > >> then one fatjar, so it would be great if the API could be extendend to
> > do
> > >> that.
> > >>
> > >> I have opened https://issues.apache.org/jira/browse/TEZ-2563
> > >>
> > >> Thanks!
> > >>
> > >>
> > >>
> > >> - André
> > >>
> > >>
> > >> --
> > >>
> > >> André Kelpe
> > >> andre@concurrentinc.com
> > >> http://concurrentinc.com
> > >>
> > >>
> > >>
> > >> —
> > >>
> > >> Chris K Wensel
> > >>
> > >> chris@wensel.net
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> > >
> > > --
> > > André Kelpe
> > > andre@concurrentinc.com
> > > http://concurrentinc.com
> >
> >
>
>
> --
> André Kelpe
> andre@concurrentinc.com
> http://concurrentinc.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message