uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marshall Schor (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-6057) Avoid falsely switching classloader
Date Mon, 08 Jul 2019 15:29:00 GMT

    [ https://issues.apache.org/jira/browse/UIMA-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880467#comment-16880467

Marshall Schor commented on UIMA-6057:

Thanks, this helps a lot.  
Here's what I see, please tell me if this is what you're intending:
 # An Analysis Engine (AE1) is created from some descriptor, and initialized.
 # Its process method is called, using a CAS from that AE1.
 # An annotator in the AE1 pipeline creates another Analysis Engine (AE2), using some other
descriptor; it is initialized.
 # The annotator then calls AE2 process method, passing the CAS associated with AE1.

Assuming this is an accurate description of the intent, this use-case was never contemplated
by the UIMA designers, I think.  There are mutliple issues, only one of which is the one
reported in this Jira. 

Having said that, this is kind of use case (running a pipeline as a "subroutine" of another
pipeline) is a recurring use-case, and UIMA could be extended to support that better.

Here are some of the issues with the above method in the current framework; there may more
(please add to the list if you know of others)
 # UIMA's APIs are split into 2 kinds: the ones Annotators call (mostly concerned with creating
/ fetching feature structure values from the CAS, running iterators over indexes), and the
Application APIs (concerned with creating pipelines from descriptors, and running pipelines). 
In between these APIs is much of the functionality of the UIMA framework, including things
like sequencing Annotators, integrating remote annotators, setting up shared external resources,
providing for configuration parameters, etc.
 ## When a CAS is not inside a pipeline, but is being referenced by the Application APIs,
the envisioned use is that the application API runs multiple "documents" through the pipeline,
and "resets" the CAS at the end of each run, and sets it up for the next document.  This
design is because the CAS is a rather heavyweight structure, taking time to create, but once
set up, can be "reset" quickly.
 ## When the CAS enters the pipeline, every time it temporary exits the UIMA framework to
enter a user's annotator code, a bit is set to "lock" the CAS to block annotators from accidentally
calling "reset" on the CAS while it is in the pipeline.  When the annotator finishes and
returns to the framework, it is unlocked.
 ## This new use case results in a "locked" CAS being set to AE2, and when that pipeline exits,
the CAS is returned to AE1 in an unlocked state.  This is probably a minor issue, of no import,
as long as the Annotator doesn't accidentally try to reset the CAS.
 # When the framework calls an Annotator's process method, it uses information from that Annotator's
metadata to set up the "result specification" - a set of what types and features ought to
be produced.  Since AE2's pipeline has a completely independent type system specification,
the result specification is in terms of that type system.  If that type system doesn't match
AE1s exactly, then the result specification won't match the type system of the CAS being sent
through the pipeline.  Impact of this: varies, because most annotators do not make any use
of the result specification.
 # The issue from this Jira - because the framework makes the assumption that when a pipeline
being run in a Pear context returns, the Pear context exits.
 # JMX counting / logging of time vs annotators : the inner pipeline's time is counted multiply
- also against the outer AEs time.
 # UIMA-AS (which provides for flexible remoting and replication of annotators) might have
other issues - don't know yet.  Maybe [~cwiklik] can weigh in.
 # If the user's initialize methods in the AE2's annotators or shared external resources make
use of the type system, they would be set up in the context of AE2's produceAE, and subsequently
run use AE1's type system, which may cause other issues.

In your use case, is it always true that AE2's type system and index specification always
match exactly AE1's?

That's all for now.  I'm not sure what the best way forward for this is...  I'm thinking
more about it and other opinions are welcome.

> Avoid falsely switching classloader
> -----------------------------------
>                 Key: UIMA-6057
>                 URL: https://issues.apache.org/jira/browse/UIMA-6057
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>            Reporter: Matthias Koch
>            Priority: Major
>         Attachments: UIMA-6057.diff, classloadertest.zip
> In some cases the classloader is switched back, although it hasn't be switched before

This message was sent by Atlassian JIRA

View raw message