uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Eckart de Castilho (Jira)" <...@uima.apache.org>
Subject [jira] [Created] (UIMA-6232) Reduce overhead of createTypeSystemDescription()
Date Fri, 08 May 2020 19:16:00 GMT
Richard Eckart de Castilho created UIMA-6232:

             Summary: Reduce overhead of createTypeSystemDescription()
                 Key: UIMA-6232
                 URL: https://issues.apache.org/jira/browse/UIMA-6232
             Project: UIMA
          Issue Type: Improvement
            Reporter: Richard Eckart de Castilho
            Assignee: Richard Eckart de Castilho
             Fix For: 2.6.0uimaFIT

uimaFIT offers a range of factory methods which use classpath scanning to locate type system
descriptions, type priority definitions and index definitions. 

The present implementation scans for each type of object once and then stores the locations
in which the descriptors were found in a global static variable. The user can call a method
to clear this variable and force a re-scan.

Whenever client code calls a method such as {{createTypeSystemDescription()}} the cached locations
are read, parsed, and a corresponding Java descriptor object is created and returned.

This issue is about two problems with this approach:

1) finding of the descriptor locations does only consider the ClassLoader situation the first
time the scanning takes place. If at a later stage, {{createTypeSystemDescription()}} is called
in the context of a ClassLoader with access to a different set of descriptions, this is not
2) parsing the XML files every time e.g.  {{createTypeSystemDescription()}} is called is slowing
uimaFIT down overall. These methods are potentially called very often, in particular every
time that {{createEngineDescription()}} or similar methods are called. Depending on the context,
the parse overhead can have significant impact on the overall execution time.

As a solution for 1), we could adopt a similar approach as it is used for JCas wrapper classes
in the JCasImpl: the locations are stored in a {{WeakHashMap}} mapping the current ClassLoader
to the discovered locations. The "current" ClassLoader is obtained via the Spring {{ClassUtils.getDefaultClassLoader()}}
which is also (indirectly) used in many other places in uimaFIT. In particular, this method
uses a Thead context classloader - if one is available.

As a solution for 2), we do not only keep a {{WeakHashMap}} cache for the locations, but also
for the parsed and aggregated XML files. When calling e.g. {{createTypeSystemDescription()}}
and the cache already contains a respective descriptor, then a deep clone of it is returned.
A similar approach (cloning a descriptor) was recently also introduced into UIMA Core to avoid
repeatedly loading and parsing default flow controller definitions.

This message was sent by Atlassian Jira

View raw message