uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Fäßler <erik.faess...@uni-jena.de>
Subject Re: Flexibility of binary CAS serialization
Date Thu, 13 Dec 2012 08:10:43 GMT
Thank you both for your hints,

Jörn, this exact topic came to my mind earlier. I want to have different "annotation stages"
of the same artifacts, so some kind of delta storage would make a lot of sense. Now I don't
have time to write such a thing on my own (I currently don't see an easy way to do it; I want
to preserver the basic annotation storage so I can experiment with the components doing the
"higher" annotations). Is there anything usable out-of-the-box regarding this topic?



Am 12.12.2012 um 18:28 schrieb Jörn Kottmann <kottmann@gmail.com>:

> On 12/12/2012 05:27 PM, Erik Fäßler wrote:
>> i am currently looking for a good approach to store a lot of CAS data. What I want
to do is to annotate a lot of text with basic annotations and save that. Then, I can read
the CAS objects with these basic annotations and don't have to do them over and over because
they are basically never changing. However, "basic" does not necessarily mean that the computation
is fast - that's why I want the storage.
> In my experiences its sometimes better to define a custom format to store the data in
a database and not use CAS serialization.
> CAS serialization has some disadvantages. To read a piece of the data in a CAS it is
necessary to load the entire CAS,
> but this might not be necessary for all operations which need to be performed, e.g. text
indexing, calculating statistics, etc.
> To add new annotations to an existing CAS you need to re-write the entire CAS data instead
of just adding a few bytes.
> Jörn

View raw message