uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: Delta CAS
Date Wed, 09 Jul 2008 16:05:46 GMT
Marshall Schor wrote:
> Some intermediate approach might help here - such as an application or 
> annotator being able to provide performance tuning hints to the 
> framework.  For instance, a tokenizer might be able to guesstimate the 
> number of tokens, based on some average token size estimate divided into 
> the size of the document, and provide that as a hint.

Tell me about it.  We've built a whole framework to try
and figure out ahead of time how much memory processing
a certain document is going to take, so we know how many
threads we can run in parallel before crashing the JVM.
This turns out to be quite difficult if you don't know
what kinds of documents you'll be getting, and you work
with many different languages.


View raw message