tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bikas Saha <bi...@hortonworks.com>
Subject RE: ClassNotFoundException with custom InputFormat.
Date Thu, 18 Jun 2015 17:17:26 GMT
+1.

Sending the jar as archive will cause it to be unjarred and then you could specify the classpath
mods by referring to the unjarred files.

At this point, perhaps in Tez, we should consider creating 2 dirs - tez and user and localize
files in them appropriately. This would separate jars and help debugging cases where jars
are duplicated in both because they wont over-write each other.

-----Original Message-----
From: Hitesh Shah [mailto:hitesh@apache.org] 
Sent: Thursday, June 18, 2015 9:57 AM
To: dev@tez.apache.org
Subject: Re: ClassNotFoundException with custom InputFormat.

Hi Andre 

Are you using Local Resource type ARCHIVE? Using FILE may not help in your scenario.

If you are using ARCHIVE, you can then use the classpath config ( TEZ_CLUSTER_ADDITIONAL_CLASSPATH_PREFIX
) to modify the classpath. 
      
 For example, assume foo.jar and bar.jar ( in the structure that you called out ) are added
to the map of local resources using keys foo and bar: 
      - classpath prefix would be "$PWD/foo/*:$PWD/foo/lib/*:$PWD/bar/*:$PWD/bar/lib/*:" 

As mentioned on the jira, the launch_container.sh from your cluster would help. Also, if you
upload an example jar to the jira, I can help provide a working example. 

thanks
- Hitesh


On Jun 18, 2015, at 9:40 AM, Andre Kelpe <akelpe@concurrentinc.com> wrote:

> On Wed, Jun 17, 2015 at 4:58 PM, Bikas Saha <bikas@hortonworks.com> wrote:
> 
>> If I understand this right, there is a jar with user code in it. The 
>> jar needs to be available during split creation but it is not available.
>> 
>> 
>> 
>> Is split creation happening on the client or on the AM. If its 
>> happening on the AM, and the AM is not getting the jars then how are 
>> you specifying the jars to be sent to the AM. There are different ways to do it.
>> 
> 
> In our case the AM is doing the split calculation. We are sending the 
> jar over as LocalResources given in the TezClient#create method
> 
> 
>> 1)      Set tez.aux.uris in tez-site.xml to an HDFS location and copy
>> user jars there
>> 
>> 2)      Upload the user jar to HDFS and create a YARN local resource for
>> it. Then use either of the following to add the local resource to the 
>> AM/DAG that needs it.
>> 
>> a.       TezClient#addAppMasterLocalFiles(.)
>> 
>> b.      DAG#addTaskLocalFiles(.)
>> 
>> 
>> 
>> Not sure what is meant by classic Hadoop style jars?
>> 
> 
> Hadoop style jars are jar files, where you have the user code + all 
> required libs in a sub-directory within the jar. The layout that 
> RunJar understands since forever.
> 
> The thing is that we can't find a way to put the jars in the lib 
> folder in the job-jar on the classpath of the AM.
> 
> - André
> 
> 
> 
>> 
>> 
>> Bikas
>> 
>> 
>> 
>> *From:* Chris K Wensel [mailto:chris@wensel.net]
>> *Sent:* Wednesday, June 17, 2015 4:41 PM
>> *To:* dev@tez.apache.org
>> *Cc:* user@tez.apache.org
>> *Subject:* Re: ClassNotFoundException with custom InputFormat.
>> 
>> 
>> 
>> cross posting down to dev. should continue the discussion there I believe.
>> 
>> 
>> 
>> as I understand it, all Cascading users familiar with packaging a 
>> Hadoop job jar with a lib folder, in which the packaged custom 
>> InputFormat is placed - pulled from maven etc, will have this issue.
>> 
>> 
>> 
>> this also expands to projects on top of Cascading including Scalding 
>> and Cascalog.
>> 
>> 
>> 
>> oddly the org.apache.tez.client.AMConfiguration has a
>> 
>> 
>> 
>> private Map<String, String> env;
>> 
>> 
>> 
>> but is unused.
>> 
>> 
>> 
>> On Jun 17, 2015, at 4:32 PM, Andre Kelpe <akelpe@concurrentinc.com>
>> wrote:
>> 
>> 
>> 
>> Hi,
>> 
>> we are currently running into a problem when a user of Cascading uses 
>> a custom InputFormat with Tez. The ApplicationMaster is running into 
>> a ClassNotFoundException when calculating the splits, since we are 
>> unable to control the environment/classpath visibile to the 
>> ApplicationMaster. We have a work-around, where the users have to 
>> supply a fat-jar to make it work, but we need to be able to support other ways as
well.
>> 
>> When interacting with the DAG, we are able to pass along a custom 
>> environment/classpath, but that API is missing on the TezClient, 
>> causing the AppMaster to fail, when the user is using classic hadoop 
>> style jars (embedded lib directory).
>> 
>> In order to get lingual, our SQL layer on top of Cascading to work 
>> correctly, we need a way to supply the environment in a more dynamic 
>> way then one fatjar, so it would be great if the API could be 
>> extendend to do that.
>> 
>> I have opened https://issues.apache.org/jira/browse/TEZ-2563
>> 
>> Thanks!
>> 
>> 
>> 
>> - André
>> 
>> 
>> --
>> 
>> André Kelpe
>> andre@concurrentinc.com
>> http://concurrentinc.com
>> 
>> 
>> 
>> -
>> 
>> Chris K Wensel
>> 
>> chris@wensel.net
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> --
> André Kelpe
> andre@concurrentinc.com
> http://concurrentinc.com



Mime
View raw message