hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ratandeep Ratti (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
Date Sat, 19 Sep 2015 03:13:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876858#comment-14876858
] 

Ratandeep Ratti commented on HIVE-11878:
----------------------------------------

bq.  if we had previously loaded a class with the previous classloader, and now load the class
again with the current classloader, would there be any potential effects here? 

The two class objects will definitely be different. I'll try to look if we compare class-objects
in the code. Some effects that come to mind are 
1. o instanceof c . If c is loaded by a classloader u1 and o is also an object of c, but the
object's class was loaded by another classloader u2.
2. casting may not work. (similar reasoning as above)

[~jdere], [~ashutoshc] . I'd also like to get your opinion on approach 3, mentioned above,
which is we do not create new classloaders for every jar, but add jars to the same classloader
using the {{addURL}} method. We basically extend the URLClassLoader and change scope of the
method addURL from protected to public. This can side step the potential problems that we
are discussing here.  As for deleting jars in {{org.apache.hadoop.hive.ql.exec.Utilities#removeFromClassPath}},
it can be exactly as before, except that it will not create an instance of URLClassloader
but a subclass of it (with scope of addURL changed) and set that as the currentThreadContext
classloader  and the Hadoop Configuration classloader.

One way to think about approach 3 is that it is exactly like what is currently being done,
except that we register all the jars at once.  I haven't implemented approach 3 yet, wanted
to get some opinion on it before I proceeded further.

> ClassNotFoundException can possibly  occur if multiple jars are registered one at a time
in Hive
> ------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-11878
>                 URL: https://issues.apache.org/jira/browse/HIVE-11878
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.2.1
>            Reporter: Ratandeep Ratti
>            Assignee: Ratandeep Ratti
>              Labels: URLClassLoader
>         Attachments: HIVE-11878.patch, HIVE-11878_qtest.patch
>
>
> When we register a jar on the Hive console. Hive creates a fresh URL classloader which
includes the path of the current jar to be registered and all the jar paths of the parent
classloader. The parent classlaoder is the current ThreadContextClassLoader. Once the URLClassloader
is created Hive sets that as the current ThreadContextClassloader.
> So if we register multiple jars in Hive, there will be multiple URLClassLoaders created,
each classloader including the jars from its parent and the one extra jar to be registered.
The last URLClassLoader created will end up as the current ThreadContextClassLoader. (See
details: org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)
> Now here's an example in which the above strategy can lead to a CNF exception.
> We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class *c1* and
internally relies on class *c2* in jar *j2*. We register *j1* first, the URLClassLoader *u1*
is created and also set as the ThreadContextClassLoader. We register *j2* next, the new URLClassLoader
created will be *u2* with *u1* as parent and *u2* becomes the new ThreadContextClassLoader.
Note *u2* includes paths to both jars *j1* and *j2* whereas *u1* only has paths to *j1* (For
details see: org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).
> Now when we register class *c1* under a temporary function in Hive, we load the class
using {code} class.forName("c1", true, Thread.currentThread().getContextClassLoader()) {code}
. The currentThreadContext class-loader is *u2*, and it has the path to the class *c1*, but
note that Class-loaders work by delegating to parent class-loader first. In this case class
*c1* will be found and *defined* by class-loader *u1*.
> Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say initialize) is
called in *c1*, which references the class *c2*, *c2* will not be found since the class-loader
used to search for *c2* will be *u1* (Since the caller's class-loader is used to load a class)
> I've added a qtest to explain the problem. Please see the attached patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message