uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Eckart de Castilho (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-3470) JCasIterable doesn't work with CasMultipliers
Date Mon, 10 Mar 2014 11:01:43 GMT

    [ https://issues.apache.org/jira/browse/UIMA-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925650#comment-13925650
] 

Richard Eckart de Castilho commented on UIMA-3470:
--------------------------------------------------

Hm, this is not quite as straightforward as I had hoped.

The sophisticated logic used by UIMA-core cannot easily be accessed from uimaFIT (ASB_Impl,
AggregateCasIterator), so we should try to access it indirectly. 

I tried two approaches:
# extending the uimaFIT JCasIterator, wrapping all engines in a single aggregate with outputNewCases
set
# extending the uimaFIT JCasIterable, wrapping the reader and all engines in a single aggregate
with outputNewCases set

In both cases, I hit a wall due to the CASPool running out of CASes. So far, the uimaFIT JCasIterable
did not require users to call jcas.release() after a JCas had been used (e.g. at the end of
a loop). However, without this call, the process gets stuck because the CASPool runs out of
CASes. 

I tried adding an implicit call to release() to the next() function of the iterator/iterable,
but that only delayed the CASPool running out by one step.

I also tried calling release() explicitly at the end of a for-loop. That seemed to work, but
only for the second approach (probably I have implemented something wrong in the first approach).

It looks like the CASPool can run out just by calling hasNext() on a UIMA JCasIterator (the
one returned from processAndOutputNewCASes). This appears to be due to ASB_impl.AggregateCasIterator.hasNext()
prefetching the next CAS.

It appears the size of the CASPool cannot be easily configured from the outside. This seems
to be each CASMultipliers own responsibility by overwriting getCasInstancesRequired().

So… currently I see these options:
* try to refactor ASB_impl.AggregateCasIterator.hasNext() to avoid prefetching
* expect that users change their code and call release()
* call release() in the hasNext() method and breaking the no-side-effects expectation about
hasNext() in the iterator interface

I'm not really happy with either of these solutions at the moment… the first might be the
best (if doable at a all).

> JCasIterable doesn't work with CasMultipliers
> ---------------------------------------------
>
>                 Key: UIMA-3470
>                 URL: https://issues.apache.org/jira/browse/UIMA-3470
>             Project: UIMA
>          Issue Type: Bug
>          Components: uimaFIT
>    Affects Versions: 2.0.0uimaFIT
>            Reporter: Richard Eckart de Castilho
>             Fix For: 2.0.1uimaFIT
>
>
> I believe the JCasIterable is currently implemented as a loop which calls
> "process" on the analysis engines for every CAS produced by the reader
> and then returns the corresponding CAS. This wouldn't work with multipliers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message