hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ratandeep Ratti (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
Date Thu, 01 Oct 2015 18:27:27 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940183#comment-14940183
] 

Ratandeep Ratti commented on HIVE-11878:
----------------------------------------

Hi [~jdere]
    I got some time to look into this today.  I incorporated your suggestion where I create
a fresh classloader when a new session is created. I use, as parent, the thread context classloader
for the freshly created session classloader (See RB: https://reviews.apache.org/r/38663/)
.  I have some doubts about using the thread context classloader as the parent.  This does
not seem to provide clean isolation between jars/resources between different sessions.  Case
in point: a thread context classloader could be a previous session's classloader .This can
happen when the same thread was used  to work on a previous session, and is now being used
to work on the newer current session. The thread context classloaer  could contain a different
implementation of the same class also present in the session classloader. Do you see this
a a problem?


Another potential problem I'm thinking about -- which is present in the proposed approach
(see RB) is --  in HiveServer2 any worker thread can serve any request by mapping it to a
persistent session. Couldn't this lead to a situation where for a specific session the session
specific classloader (conf.getClassLoader()) and the thread context classloader end up being
 different?  Say we have  two worker thread t1 and t2 .The  very first query is handled by
t1 where a fresh session s1 is created along with a fresh classloader c1, which is  set as
the session specific classloader and the thread context classloader. The next query for the
same session is handled by t2. I guess since it is the same session s1, we do not create a
fresh classloader. The session specific classloader is c1, but since it is a different thread
and no classloader has been set on it, the thread will have the system classloader as its
context classloader.  Couldn't this cause potential CNF exceptions?  If I understood correctly
  this problem also exists in the current implementation, doesn't it?

> ClassNotFoundException can possibly  occur if multiple jars are registered one at a time
in Hive
> ------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-11878
>                 URL: https://issues.apache.org/jira/browse/HIVE-11878
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.2.1
>            Reporter: Ratandeep Ratti
>            Assignee: Ratandeep Ratti
>              Labels: URLClassLoader
>         Attachments: HIVE-11878.patch, HIVE-11878_approach3.patch, HIVE-11878_qtest.patch
>
>
> When we register a jar on the Hive console. Hive creates a fresh URL classloader which
includes the path of the current jar to be registered and all the jar paths of the parent
classloader. The parent classlaoder is the current ThreadContextClassLoader. Once the URLClassloader
is created Hive sets that as the current ThreadContextClassloader.
> So if we register multiple jars in Hive, there will be multiple URLClassLoaders created,
each classloader including the jars from its parent and the one extra jar to be registered.
The last URLClassLoader created will end up as the current ThreadContextClassLoader. (See
details: org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)
> Now here's an example in which the above strategy can lead to a CNF exception.
> We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class *c1* and
internally relies on class *c2* in jar *j2*. We register *j1* first, the URLClassLoader *u1*
is created and also set as the ThreadContextClassLoader. We register *j2* next, the new URLClassLoader
created will be *u2* with *u1* as parent and *u2* becomes the new ThreadContextClassLoader.
Note *u2* includes paths to both jars *j1* and *j2* whereas *u1* only has paths to *j1* (For
details see: org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).
> Now when we register class *c1* under a temporary function in Hive, we load the class
using {code} class.forName("c1", true, Thread.currentThread().getContextClassLoader()) {code}
. The currentThreadContext class-loader is *u2*, and it has the path to the class *c1*, but
note that Class-loaders work by delegating to parent class-loader first. In this case class
*c1* will be found and *defined* by class-loader *u1*.
> Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say initialize) is
called in *c1*, which references the class *c2*, *c2* will not be found since the class-loader
used to search for *c2* will be *u1* (Since the caller's class-loader is used to load a class)
> I've added a qtest to explain the problem. Please see the attached patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message