uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: [jira] [Commented] (UIMA-2373) Possible bug in FixedFlowController
Date Fri, 17 Feb 2012 21:34:08 GMT
Our docs say that AE's are run in a single thread model (see 
http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.aae.contract_for_annotator_methods).
 
If multiple threads are wanted, the framework supports this by making multiple 
instances of the AE's implementation class.  This limits "thread-safety" issues 
to only "static" or class-level fields.

The reason for this was an observation that the people writing annotators, 
although skilled in their particular discipline and able to write code that 
extracted information from Unstructured data, did not typically have the skills 
needed to write correct multi-threaded implementations in Java.  So the 
framework "helped" here, by insuring that any parallelism the framework 
supported created multiple instances of the annotator class, for each thread.

I believe, however, that it is currently possible to use the framework in ways 
in which the application writer creates multiple threads and calls the same 
annotator instance on multiple threads at the same time.  Perhaps a proper 
approach here would be to have the framework detect this, and signal some kind 
of error.

-Marshall

On 2/17/2012 4:09 AM, Tommaso Teofili (Commented) (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/UIMA-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210151#comment-13210151
]
>
> Tommaso Teofili commented on UIMA-2373:
> ---------------------------------------
>
> bq. Possibly a concurrency issue?
>
> Yes, I think so.
> That came out when an AE is used from different clients which execute in parallel, so
I wonder if is the usage which is wrong  or we should allow that and thus made a fix for it.
>
>> Possible bug in FixedFlowController
>> -----------------------------------
>>
>>                  Key: UIMA-2373
>>                  URL: https://issues.apache.org/jira/browse/UIMA-2373
>>              Project: UIMA
>>           Issue Type: Bug
>>     Affects Versions: 2.4.0SDK
>>             Reporter: Tommaso Teofili
>>
>> I am developing a series of Lucene tokenizers which can use UIMA for creating tokens
via extracted annotations.
>> While doing a stress test with lots of different strings I experienced the following:
>> {noformat}
>> [junit] Testsuite: org.apache.lucene.analysis.uima.UIMATypeAwareAnalyzerTest
>>      [junit] Tests run: 2, Failures: 0, Errors: 1, Time elapsed: 92,061 sec
>>      [junit]
>>      [junit] ------------- Standard Error -----------------
>>      [junit] The following exceptions were thrown by threads:
>>      [junit] *** Thread: Thread-9 ***
>>      [junit] java.lang.RuntimeException: java.io.IOException: org.apache.uima.analysis_engine.AnalysisEngineProcessException
>>      [junit] 	at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:289)
>>      [junit] Caused by: java.io.IOException: org.apache.uima.analysis_engine.AnalysisEngineProcessException
>>      [junit] 	at org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer.incrementToken(UIMATypeAwareAnnotationsTokenizer.java:87)
>>      [junit] 	at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:121)
>>      [junit] 	at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:371)
>>      [junit] 	at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:295)
>>      [junit] 	at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:287)
>>      [junit] Caused by: org.apache.uima.analysis_engine.AnalysisEngineProcessException
>>      [junit] 	at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:701)
>>      [junit] 	at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:409)
>>      [junit] 	at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
>>      [junit] 	at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
>>      [junit] 	at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
>>      [junit] 	at org.apache.lucene.analysis.uima.BaseUIMATokenizer.analyzeInput(BaseUIMATokenizer.java:57)
>>      [junit] 	at org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer.analyzeText(UIMATypeAwareAnnotationsTokenizer.java:73)
>>      [junit] 	at org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer.incrementToken(UIMATypeAwareAnnotationsTokenizer.java:85)
>>      [junit] 	... 4 more
>>      [junit] Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 2
>>      [junit] 	at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>>      [junit] 	at java.util.ArrayList.get(ArrayList.java:322)
>>      [junit] 	at org.apache.uima.flow.impl.FixedFlowController$FixedFlowObject.next(FixedFlowController.java:216)
>>      [junit] 	at org.apache.uima.analysis_engine.asb.impl.FlowContainer.next(FlowContainer.java:98)
>>      [junit] 	at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:667)
>>      [junit] 	... 11 more
>> {noformat}
>> I'm debugging it and see if I can come up with the exact bug (and fix) :)
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>

Mime
View raw message