uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl (JIRA) <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-2332) Profile and optimize Ruta inference performance
Date Wed, 08 Jan 2014 13:49:50 GMT

    [ https://issues.apache.org/jira/browse/UIMA-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865437#comment-13865437

Peter Klügl commented on UIMA-2332:

Some more information about the profiling:
- the test script consists essentially of the first three phases of the ANNIE NER together
with the gazetteers. The third phase contains about 54 rules. The Ruta script contains overall
about 50 rules.
- ignoring the initialization, 82% of the time is used for inference (and dictionaries), 18%
for initializing the RutaStream, that is the seeding (2%) and RutaBasics (16%)
- the main hotspot is TOP.getAddress() with 37%. 60% caused by FSIteratorWrapper.get(), 25%
caused by FeatureStructureImpl.getType()

The next step of improvement could be to reduce the usage of nice lists/sets/maps, e.g., use
arrays in RutaBasic.

> Profile and optimize Ruta inference performance
> -----------------------------------------------
>                 Key: UIMA-2332
>                 URL: https://issues.apache.org/jira/browse/UIMA-2332
>             Project: UIMA
>          Issue Type: Improvement
>          Components: ruta
>    Affects Versions: 2.0.0TextMarker
>            Reporter: Peter Klügl
>            Assignee: Peter Klügl
>            Priority: Minor
>             Fix For: 2.1.1ruta
> Increase the speed of the ruta rule inference. A starting point is the slowdown of UIMA-2330,
see RutaTypeMatcher.getMatchingAnnotations()

This message was sent by Atlassian JIRA

View raw message