uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bhavani Iyer" <bhavan...@gmail.com>
Subject Re: Delta CAS
Date Tue, 08 Jul 2008 19:47:40 GMT
That's not a good thing, and certainly no justification of using
the same design here.

     Not sure what you are refering to here.

I thought you needed to keep a list of the
added FSs anyway.  Why do you need to rely on the max FS ID?

    Not if  we use the high water mark (max FS is at start) which is less
    The list is implicit in that the added FSs are those with ids over the
high water mark

On Tue, Jul 8, 2008 at 3:06 PM, Thilo Goetz <twgoetz@gmx.de> wrote:

> Bhavani Iyer wrote:
>> Hi Thilo,
>> There are two separate requirements being addressed here:
>> 1) delta CAS for optimizing remote services.
>>     Here its agreed that there should be no measurable overhead when there
>> is no remoting.
>>     There will be a single test against the high water mark.  The high
>> water mark defaults to 0.  Only when the high
>>     water mark is set to a value greater than 0 is logging of  CAS
>> operations on FSs below the high water mark enabled.
>> 2)  Journaling for debugging  aggregate components.
>>    This capability is for Core UIMA as well as for remote services. This
>> will have some additional overhead and will be have to be explicitly
>> enabled
>> by the aggregate controller for a component. Basically the aggregate
>> controller enables journaling by setting the high water mark before the
>> call
>> to process.
>> Regarding using the high water mark, this is already being used for
>> merging
>> CAS.
> That's not a good thing, and certainly no justification of using
> the same design here.  I thought you needed to keep a list of the
> added FSs anyway.  Why do you need to rely on the max FS ID?
> --Thilo
>> Bhavani
>> On Tue, Jul 8, 2008 at 10:22 AM, Thilo Goetz <twgoetz@gmx.de> wrote:
>>  My immediate reactions are:
>>> - you should be forced to turn this on if you want to
>>> use it.  By default it should be off.  There should be
>>> no overhead for UIMA instances that don't do any remoting.
>>> You could implement a CAS wrapper that delegates to the
>>> real CAS and does the bookkeeping.  The newCAS() functions
>>> can return the wrapper CASes when DeltaCAS is enabled.
>>> - I've gotten over wanting to improve the CAS implementation,
>>> but I think the watermark thing is not a good idea.  It
>>> totally relies on the current CAS heap implementation and
>>> will prevent any improvements in this area.  If there is
>>> another way of doing it, that would be great.
>>> --Thilo
>>> Bhavani Iyer wrote:
>>>  We're planning to start  the implementation to support Delta CAS as
>>>> described here:
>>>> http://cwiki.apache.org/UIMA/reducing-overhead-for-remote-service-calls.html
>>>> The current thinking on the design is described below and we would like
>>>> some
>>>> feedback.
>>>> *
>>>> CAS Activity Journal*
>>>> In order to be able to export as XMI only the updates to the CAS, we
>>>> need
>>>> to
>>>> maintain
>>>> a journal associated with the CAS to track these update activities.  The
>>>> journal will
>>>> contain the following information:
>>>> 1) To identify new FSs, the max FS id at the start of processing is
>>>> saved
>>>> .
>>>> New FS added
>>>>    would have ids above this high water mark.
>>>> 2) To identify which pre-existing FSs were modified,, a list of ids of
>>>> pre-existing FSs that have been changed.
>>>> 3)  To track updates to the Views, a list of FS ids added, removed and
>>>> reindexed in the index repository of pre-existing Views.
>>>> The CAS APIs  that set feature values as well as APIs that add or remove
>>>> from the
>>>> index repository will be modified to update the CAS activity journal.
>>>> The overhead is expected to be be minimal since it will simply add an FS
>>>> id
>>>> to the
>>>> appropriate list as mentioned above.
>>>> *Delta CAS XMI serialization*
>>>> A more compact representation of the delta CAS data than the proposed
>>>> XMI:Difference format
>>>> would be preferred for transmitting CAS data when making calls to a
>>>> remote
>>>> service.
>>>> Instead, we propose that CAS updates  be serialized in the same format
>>>> as
>>>> the XMI CAS with one modification.
>>>> Additional attributes will be defined in the View element to contain the
>>>> list of
>>>> ids of FSs that were added, deleted and reindexed:
>>>> Example of current cas:View
>>>> <cas:View sofa="1" members="8 13 20 26 42"/>
>>>> Example of proposed cas:View to support delta CAS
>>>> <cas:View sofa="1" members_added="32"  members_deleted="13 20"
>>>> members_reindexed="26 42" />
>>>> XMI deserialization of the delta CAS  will update the CAS as follows:
>>>> 1) create new FS for those elements where xmi:id is above the high water
>>>> mark.
>>>> 2) update the FSs feature values for those elements where xmi:id is
>>>> below
>>>> the high water mark. Note, features
>>>>   missing in the XMI will be set to null.
>>>> 3) Process the view element to add, remove or reindex in the specified
>>>> VIew
>>>> index repository.
>>>> Please note that this proposed XMI representation itself does not
>>>> identify
>>>> or mark the CAS as a delta CAS.
>>>> In the context of UIMA AS services, additional properties in the request
>>>> and
>>>> reply messages will
>>>> specify that the XMI contains a delta CAS.
>>>> Applications should not use the API to export a delta CAS to a file for
>>>> later processing without
>>>> taking additional steps to retain the format information.
>>>> *Delta CAS for debugging*
>>>> To support debugging UIMA aggregates as described here
>>>>  http://cwiki.apache.org/UIMA/improving-uima-debug-capabilities.html
>>>> the delta CAS implementation will be extended as follows:
>>>> 1) the CAS activity journal will be maintained for each component that
>>>> is
>>>> called during aggregate processing.
>>>> 2) an API to enable/disable this extended journaling by component.
>>>> 3) define the XMI representation of the CAS activity journal. Details on
>>>> this will be posted shortly.
>>>> We would appreciate comments and suggestions on the proposed changes.
>>>> Thanks,
>>>> Bhavani

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message