uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Eckart de Castilho (Jira)" <...@uima.apache.org>
Subject [jira] [Resolved] (UIMA-6232) Reduce overhead of createTypeSystemDescription() and friends
Date Fri, 15 May 2020 15:07:00 GMT

     [ https://issues.apache.org/jira/browse/UIMA-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Richard Eckart de Castilho resolved UIMA-6232.
    Resolution: Fixed

> Reduce overhead of createTypeSystemDescription() and friends
> ------------------------------------------------------------
>                 Key: UIMA-6232
>                 URL: https://issues.apache.org/jira/browse/UIMA-6232
>             Project: UIMA
>          Issue Type: Improvement
>          Components: uimaFIT
>            Reporter: Richard Eckart de Castilho
>            Assignee: Richard Eckart de Castilho
>            Priority: Major
>             Fix For: 2.6.0uimaFIT, 3.2.0uimaFIT
> uimaFIT offers a range of factory methods which use classpath scanning to locate type
system descriptions, type priority definitions and index definitions. 
> The present implementation scans for each type of object once and then stores the locations
in which the descriptors were found in a global static variable. The user can call a method
to clear this variable and force a re-scan.
> Whenever client code calls a method such as {{createTypeSystemDescription()}} the cached
locations are read, parsed, and a corresponding Java descriptor object is created and returned.
> This issue is about two problems with this approach:
> 1) finding of the descriptor locations does only consider the ClassLoader situation the
first time the scanning takes place. If at a later stage, {{createTypeSystemDescription()}}
is called in the context of a ClassLoader with access to a different set of descriptions,
this is not considered.
> 2) parsing the XML files every time e.g.  {{createTypeSystemDescription()}} is called
is slowing uimaFIT down overall. These methods are potentially called very often, in particular
every time that {{createEngineDescription()}} or similar methods are called. Depending on
the context, the parse overhead can have significant impact on the overall execution time.
> As a solution for 1), we could adopt a similar approach as it is used for JCas wrapper
classes in the JCasImpl: the locations are stored in a {{WeakHashMap}} mapping the current
ClassLoader to the discovered locations. The "current" ClassLoader is obtained via the Spring
{{ClassUtils.getDefaultClassLoader()}} which is also (indirectly) used in many other places
in uimaFIT. In particular, this method uses a Thead context classloader - if one is available.
> As a solution for 2), we do not only keep a {{WeakHashMap}} cache for the locations,
but also for the parsed and aggregated XML files. When calling e.g. {{createTypeSystemDescription()}}
and the cache already contains a respective descriptor, then a deep clone of it is returned.
A similar approach (cloning a descriptor) was recently also introduced into UIMA Core to avoid
repeatedly loading and parsing default flow controller definitions.
> *Benchmarking UIMA-

This message was sent by Atlassian Jira

View raw message