uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <eckar...@tk.informatik.tu-darmstadt.de>
Subject Re: CAS Id
Date Tue, 04 Oct 2011 21:36:19 GMT
Am 04.10.2011 um 21:41 schrieb Eddie Epstein:

> On Tue, Oct 4, 2011 at 5:34 AM, Jörn Kottmann <kottmann@gmail.com> wrote:
>> In the end I believe a simple CAS ID field could be quite useful, for
>> debugging/logging, as a
>> document ID in simple UIMA pipelines and for applications which deal with
>> whole CASes
>> (e.g. the Cas Editor based annotation tooling, or an AE which extracts
>> "problematic" CASes
>> from an analysis pipeline for inspection).
>> To implement this I suggest that we extend to CAS interface with
>> CAS.setId(String) and CAS.getId() methods.
> If one were to implement CAS.setID() the data should be stored in the
> CAS as a type/feature so that all of the different CAS serialization
> and transport mechanisms are unchanged. Probably as an additional
> feature in SofaFS would be best. Presumably this string would want to
> be immutable (as are other SofaFS features)?
> Still not clear to me that this feature adds value beyond application
> specific type system data.

I can well understand the use cases by Jörn and Burn in which they need
an identifier in the CAS which can be used to associate a context with the
CAS that holds information not contained in the CAS itself: either a UIMA-AS
context or some database storage knowledge. 

I have also had the desire at some times to associate arbitrary Java objects 
with a particular CAS instance. I ended up creating a static HashMap using 
the CAS instance itself as key, but I would have preferred some kind of
"session" information that is associated with the CAS itself (there is a
session in the UIMAContext of components, but not for the CAS). I consider that
solution a hack tough because it only works in a non-distributed environment. 

For a distributed environment like UIMA-AS, associating an ID with a
CAS in such a manner that no custom FeatureStructure is required would be convenient.

I still wonder if such an ID should be associated with each separately view or with
the whole CAS object.

I also wonder if it would not be good to have a generic string key/value properties
to associate with a CAS or view. That could substitute the SourceDocumentInformation,
allow for arbitrary metadata such as is generated by the TikaAnnotator and could
be used to store Jörn's DB ID and Burn's UIMA-AS ID and my URI/baseURI - and even all
of that at the same time. If there is only one ID field, different applications might
compete for that. It could be stored in the SofaFS (mutable please) and there could be
convenient CAS.getProperty(String) and CAS.setProperty(String,String) methods.



Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de

View raw message