uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Eckart de Castilho (Jira)" <...@uima.apache.org>
Subject [jira] [Updated] (UIMA-6232) Reduce overhead of createTypeSystemDescription() and friends
Date Fri, 08 May 2020 23:44:00 GMT

     [ https://issues.apache.org/jira/browse/UIMA-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Richard Eckart de Castilho updated UIMA-6232:
---------------------------------------------
    Description: 
uimaFIT offers a range of factory methods which use classpath scanning to locate type system
descriptions, type priority definitions and index definitions. 

The present implementation scans for each type of object once and then stores the locations
in which the descriptors were found in a global static variable. The user can call a method
to clear this variable and force a re-scan.

Whenever client code calls a method such as {{createTypeSystemDescription()}} the cached locations
are read, parsed, and a corresponding Java descriptor object is created and returned.

This issue is about two problems with this approach:

1) finding of the descriptor locations does only consider the ClassLoader situation the first
time the scanning takes place. If at a later stage, {{createTypeSystemDescription()}} is called
in the context of a ClassLoader with access to a different set of descriptions, this is not
considered.
2) parsing the XML files every time e.g.  {{createTypeSystemDescription()}} is called is slowing
uimaFIT down overall. These methods are potentially called very often, in particular every
time that {{createEngineDescription()}} or similar methods are called. Depending on the context,
the parse overhead can have significant impact on the overall execution time.

As a solution for 1), we could adopt a similar approach as it is used for JCas wrapper classes
in the JCasImpl: the locations are stored in a {{WeakHashMap}} mapping the current ClassLoader
to the discovered locations. The "current" ClassLoader is obtained via the Spring {{ClassUtils.getDefaultClassLoader()}}
which is also (indirectly) used in many other places in uimaFIT. In particular, this method
uses a Thead context classloader - if one is available.

As a solution for 2), we do not only keep a {{WeakHashMap}} cache for the locations, but also
for the parsed and aggregated XML files. When calling e.g. {{createTypeSystemDescription()}}
and the cache already contains a respective descriptor, then a deep clone of it is returned.
A similar approach (cloning a descriptor) was recently also introduced into UIMA Core to avoid
repeatedly loading and parsing default flow controller definitions.

*Benchmarking UIMA-
 

  was:
uimaFIT offers a range of factory methods which use classpath scanning to locate type system
descriptions, type priority definitions and index definitions. 

The present implementation scans for each type of object once and then stores the locations
in which the descriptors were found in a global static variable. The user can call a method
to clear this variable and force a re-scan.

Whenever client code calls a method such as {{createTypeSystemDescription()}} the cached locations
are read, parsed, and a corresponding Java descriptor object is created and returned.

This issue is about two problems with this approach:

1) finding of the descriptor locations does only consider the ClassLoader situation the first
time the scanning takes place. If at a later stage, {{createTypeSystemDescription()}} is called
in the context of a ClassLoader with access to a different set of descriptions, this is not
considered.
2) parsing the XML files every time e.g.  {{createTypeSystemDescription()}} is called is slowing
uimaFIT down overall. These methods are potentially called very often, in particular every
time that {{createEngineDescription()}} or similar methods are called. Depending on the context,
the parse overhead can have significant impact on the overall execution time.

As a solution for 1), we could adopt a similar approach as it is used for JCas wrapper classes
in the JCasImpl: the locations are stored in a {{WeakHashMap}} mapping the current ClassLoader
to the discovered locations. The "current" ClassLoader is obtained via the Spring {{ClassUtils.getDefaultClassLoader()}}
which is also (indirectly) used in many other places in uimaFIT. In particular, this method
uses a Thead context classloader - if one is available.

As a solution for 2), we do not only keep a {{WeakHashMap}} cache for the locations, but also
for the parsed and aggregated XML files. When calling e.g. {{createTypeSystemDescription()}}
and the cache already contains a respective descriptor, then a deep clone of it is returned.
A similar approach (cloning a descriptor) was recently also introduced into UIMA Core to avoid
repeatedly loading and parsing default flow controller definitions.
 


> Reduce overhead of createTypeSystemDescription() and friends
> ------------------------------------------------------------
>
>                 Key: UIMA-6232
>                 URL: https://issues.apache.org/jira/browse/UIMA-6232
>             Project: UIMA
>          Issue Type: Improvement
>          Components: uimaFIT
>            Reporter: Richard Eckart de Castilho
>            Assignee: Richard Eckart de Castilho
>            Priority: Major
>             Fix For: 2.6.0uimaFIT
>
>
> uimaFIT offers a range of factory methods which use classpath scanning to locate type
system descriptions, type priority definitions and index definitions. 
> The present implementation scans for each type of object once and then stores the locations
in which the descriptors were found in a global static variable. The user can call a method
to clear this variable and force a re-scan.
> Whenever client code calls a method such as {{createTypeSystemDescription()}} the cached
locations are read, parsed, and a corresponding Java descriptor object is created and returned.
> This issue is about two problems with this approach:
> 1) finding of the descriptor locations does only consider the ClassLoader situation the
first time the scanning takes place. If at a later stage, {{createTypeSystemDescription()}}
is called in the context of a ClassLoader with access to a different set of descriptions,
this is not considered.
> 2) parsing the XML files every time e.g.  {{createTypeSystemDescription()}} is called
is slowing uimaFIT down overall. These methods are potentially called very often, in particular
every time that {{createEngineDescription()}} or similar methods are called. Depending on
the context, the parse overhead can have significant impact on the overall execution time.
> As a solution for 1), we could adopt a similar approach as it is used for JCas wrapper
classes in the JCasImpl: the locations are stored in a {{WeakHashMap}} mapping the current
ClassLoader to the discovered locations. The "current" ClassLoader is obtained via the Spring
{{ClassUtils.getDefaultClassLoader()}} which is also (indirectly) used in many other places
in uimaFIT. In particular, this method uses a Thead context classloader - if one is available.
> As a solution for 2), we do not only keep a {{WeakHashMap}} cache for the locations,
but also for the parsed and aggregated XML files. When calling e.g. {{createTypeSystemDescription()}}
and the cache already contains a respective descriptor, then a deep clone of it is returned.
A similar approach (cloning a descriptor) was recently also introduced into UIMA Core to avoid
repeatedly loading and parsing default flow controller definitions.
> *Benchmarking UIMA-
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message