Marshall Schor wrote:
> Thilo Goetz wrote:
>> Bhavani Iyer wrote:
>>> Hi Thilo,
>>>
>>> There are two separate requirements being addressed here:
>>> 1) delta CAS for optimizing remote services.
>>> Here its agreed that there should be no measurable overhead when
>>> there
>>> is no remoting.
>>> There will be a single test against the high water mark. The high
>>> water mark defaults to 0. Only when the high
>>> water mark is set to a value greater than 0 is logging of CAS
>>> operations on FSs below the high water mark enabled.
>>> 2) Journaling for debugging aggregate components.
>>> This capability is for Core UIMA as well as for remote services.
>>> This
>>> will have some additional overhead and will be have to be explicitly
>>> enabled
>>> by the aggregate controller for a component. Basically the aggregate
>>> controller enables journaling by setting the high water mark before
>>> the call
>>> to process.
>>>
>>> Regarding using the high water mark, this is already being used for
>>> merging
>>> CAS.
>>
>> That's not a good thing, and certainly no justification of using
>> the same design here.
> Can you say more about why this is not a good thing? I see it as an
> internal design detail.
Precisely. It's an implementation detail of the CAS heap that
we should be able to change -- that we must be able to change
if we would like to improve on the heap. The CAS heap and
in particular the way it grows is a major performance bottleneck
for large documents. If we have other parts of UIMA depend on
the (bad) implementation details now, we'll never be able to
improve on the design.
>> I thought you needed to keep a list of the
>> added FSs anyway.
> I don't think such a list is kept. There is a list of modified FSs
> below below the high water mark, and a list of things
> added/removed/modified with the indexes (in other words, if new feature
> structures were added, but not indexed, they would not be in any list).
>
> -Marshall
|