uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: Channel Usability Analytics using UIMA : Request for info on the Log-File Size..
Date Wed, 18 Feb 2009 15:26:17 GMT
UIMA uses its CAS to pass information from one annotator to the next. 
If the annotators are co-located, the CAS you can think of as a set of
memory-resident data structures, passed by reference. If annotators are
on different IP addresses in a network, CASes are
serialized/deserialized and sent over various network transports.

The CAS can also be thought of as the "unit of work" for UIMA.  A very
large (or, indeed "infinite" - such as a real-time continuous feed) can
be broken up into units of work by a CAS Multiplier (or a collection
reader) component.  This is typically done for systems doing things like
real-time speech or video analysis.  This component typically does the
initial basic analysis to decide on where logical units of work occur -
for instance, in the case of audio, it might break things on "silence"

The CAS also contains the "subject of analysis".  Often, this is a
string of characters, representing a document to be analyzed, or a
string of bytes representing audio, etc.  A CAS can, however, contain
instead of the literal subject of analysis, a reference to an external
source for this. 

The bottom line answer to your question:  The CAS is a unit of work; it
is kept in memory, so this can be a limit.  Users of UIMA often break
very large pieces of work into multiple CASes to control this; the
actual subject of analysis can be literally in the CAS or just a reference.

We have seen applications running on 64-bit versions of operating
systems with 64-bit Java JVMs that routinely support
multi-gigabyte-sized CASes, so, CASes can be quite large, with today's

HTH.  -Marshall

Balkrishnan V wrote:
> Hi,
> I am working on a UIMA solution for web-server log-analysis, to identify the
> user-patterns.
> I am new to this UIMA framework. So, can you please give me some pointers as to
> at what point of my statistics-run would I encounter Operating-System
> issues/constraints ?
> For example, is there any limit to the size of the log-files (.txt) that I can
> feed to the UIMA ? If so, can you please give me some details on the same ?
> TIA.
> Regards,
> Balkrishnan.V

View raw message