uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: Delta CAS
Date Tue, 08 Jul 2008 19:06:02 GMT
Bhavani Iyer wrote:
> Hi Thilo,
> There are two separate requirements being addressed here:
> 1) delta CAS for optimizing remote services.
>      Here its agreed that there should be no measurable overhead when there
> is no remoting.
>      There will be a single test against the high water mark.  The high
> water mark defaults to 0.  Only when the high
>      water mark is set to a value greater than 0 is logging of  CAS
> operations on FSs below the high water mark enabled.
> 2)  Journaling for debugging  aggregate components.
>     This capability is for Core UIMA as well as for remote services. This
> will have some additional overhead and will be have to be explicitly enabled
> by the aggregate controller for a component. Basically the aggregate
> controller enables journaling by setting the high water mark before the call
> to process.
> Regarding using the high water mark, this is already being used for merging
> CAS.

That's not a good thing, and certainly no justification of using
the same design here.  I thought you needed to keep a list of the
added FSs anyway.  Why do you need to rely on the max FS ID?


> Bhavani
> On Tue, Jul 8, 2008 at 10:22 AM, Thilo Goetz <twgoetz@gmx.de> wrote:
>> My immediate reactions are:
>> - you should be forced to turn this on if you want to
>> use it.  By default it should be off.  There should be
>> no overhead for UIMA instances that don't do any remoting.
>> You could implement a CAS wrapper that delegates to the
>> real CAS and does the bookkeeping.  The newCAS() functions
>> can return the wrapper CASes when DeltaCAS is enabled.
>> - I've gotten over wanting to improve the CAS implementation,
>> but I think the watermark thing is not a good idea.  It
>> totally relies on the current CAS heap implementation and
>> will prevent any improvements in this area.  If there is
>> another way of doing it, that would be great.
>> --Thilo
>> Bhavani Iyer wrote:
>>> We're planning to start  the implementation to support Delta CAS as
>>> described here:
>>> http://cwiki.apache.org/UIMA/reducing-overhead-for-remote-service-calls.html
>>> The current thinking on the design is described below and we would like
>>> some
>>> feedback.
>>> *
>>> CAS Activity Journal*
>>> In order to be able to export as XMI only the updates to the CAS, we need
>>> to
>>> maintain
>>> a journal associated with the CAS to track these update activities.  The
>>> journal will
>>> contain the following information:
>>> 1) To identify new FSs, the max FS id at the start of processing is saved
>>> .
>>> New FS added
>>>     would have ids above this high water mark.
>>> 2) To identify which pre-existing FSs were modified,, a list of ids of
>>> pre-existing FSs that have been changed.
>>> 3)  To track updates to the Views, a list of FS ids added, removed and
>>> reindexed in the index repository of pre-existing Views.
>>> The CAS APIs  that set feature values as well as APIs that add or remove
>>> from the
>>> index repository will be modified to update the CAS activity journal.
>>> The overhead is expected to be be minimal since it will simply add an FS
>>> id
>>> to the
>>> appropriate list as mentioned above.
>>> *Delta CAS XMI serialization*
>>> A more compact representation of the delta CAS data than the proposed
>>> XMI:Difference format
>>> would be preferred for transmitting CAS data when making calls to a remote
>>> service.
>>> Instead, we propose that CAS updates  be serialized in the same format as
>>> the XMI CAS with one modification.
>>> Additional attributes will be defined in the View element to contain the
>>> list of
>>> ids of FSs that were added, deleted and reindexed:
>>> Example of current cas:View
>>> <cas:View sofa="1" members="8 13 20 26 42"/>
>>> Example of proposed cas:View to support delta CAS
>>> <cas:View sofa="1" members_added="32"  members_deleted="13 20"
>>> members_reindexed="26 42" />
>>> XMI deserialization of the delta CAS  will update the CAS as follows:
>>> 1) create new FS for those elements where xmi:id is above the high water
>>> mark.
>>> 2) update the FSs feature values for those elements where xmi:id is below
>>> the high water mark. Note, features
>>>    missing in the XMI will be set to null.
>>> 3) Process the view element to add, remove or reindex in the specified
>>> VIew
>>> index repository.
>>> Please note that this proposed XMI representation itself does not identify
>>> or mark the CAS as a delta CAS.
>>> In the context of UIMA AS services, additional properties in the request
>>> and
>>> reply messages will
>>> specify that the XMI contains a delta CAS.
>>> Applications should not use the API to export a delta CAS to a file for
>>> later processing without
>>> taking additional steps to retain the format information.
>>> *Delta CAS for debugging*
>>> To support debugging UIMA aggregates as described here
>>>  http://cwiki.apache.org/UIMA/improving-uima-debug-capabilities.html
>>> the delta CAS implementation will be extended as follows:
>>> 1) the CAS activity journal will be maintained for each component that is
>>> called during aggregate processing.
>>> 2) an API to enable/disable this extended journaling by component.
>>> 3) define the XMI representation of the CAS activity journal. Details on
>>> this will be posted shortly.
>>> We would appreciate comments and suggestions on the proposed changes.
>>> Thanks,
>>> Bhavani

View raw message