uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: Flexibility of binary CAS serialization
Date Wed, 12 Dec 2012 17:28:37 GMT
On 12/12/2012 05:27 PM, Erik Fäßler wrote:
> i am currently looking for a good approach to store a lot of CAS data. What I want to
do is to annotate a lot of text with basic annotations and save that. Then, I can read the
CAS objects with these basic annotations and don't have to do them over and over because they
are basically never changing. However, "basic" does not necessarily mean that the computation
is fast - that's why I want the storage.

In my experiences its sometimes better to define a custom format to 
store the data in a database and not use CAS serialization.

CAS serialization has some disadvantages. To read a piece of the data in 
a CAS it is necessary to load the entire CAS,
but this might not be necessary for all operations which need to be 
performed, e.g. text indexing, calculating statistics, etc.
To add new annotations to an existing CAS you need to re-write the 
entire CAS data instead of just adding a few bytes.

Jörn

Mime
View raw message