uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: Delta CAS
Date Mon, 14 Jul 2008 14:40:19 GMT
Bhavani Iyer wrote:
> OK sounds like the suggested improvements to the CAS heap design would still
> preserve the high water
> mark mechanism for identifying new FSs as those added after the mark.  Is
> this correct ? 

No.  My conclusion was that we'll create a CAS API that returns
a returns a marker object which may later be used to query the
CAS about certain FSs and when they were created.  This object
will be opaque to CAS users and transient in nature.  Please feel
free to make a suggestion for such an API to make sure your
requirements are covered.

 > If so, implementation can start. Should there a branch
> created for this work ?

I don't see why we need a branch for this.

> The other main concern discussed was the overhead for core UIMA use without
> remoting. There should be no
> measureable overhead since there will be one int compare on calls to set
> feature value and add to index
> and no impact on accessing FS values.

Please explain your design.  I expect that there'll be a
global setting, so at most a boolean is checked?

> If the overhead turns out to an issue, we could still work around it with a
> separate class implementing
> CAS with journaling or a wrapper class as suggested before.
> Bhavani
> On Thu, Jul 10, 2008 at 12:57 PM, Marshall Schor <msa@schor.com> wrote:
>> Thilo Goetz wrote:
>>> Eddie Epstein wrote:
>>>> No opinions, but a few observations:
>>>> 1M is way too big for some applications that need very small, but very
>>>> many
>>>> CASes.
>>> I agree.
>> How about treating the 1st 1 mb segment with the same approach as the heap
>> is now - providing the ability to start small, and expanding it (by
>> reallocating and copying) until it gets to 1 mb?
>> -Marshall
>>>> Large arrays may be bigger than whatever segment size is chosen, making
>>>> segment management a bit more complicated.
>>>> There will be holes at the top of every segment when the next FS doesn't
>>>> fit.
>>> Not necessarily.  Why couldn't you spread FSs and arrays
>>> across segments?
>>>> Eddie
>>>> On Wed, Jul 9, 2008 at 2:37 PM, Marshall Schor <msa@schor.com> wrote:
>>>>  Here's a suggestion suggested by previous posts, and common hardware
>>>>> design
>>>>> for segmented memory.
>>>>> Take the int values that represent feature structure (fs) references.
>>>>>  Today, these are positive numbers from 1 (I think) to around 4 billion.
>>>>>  These values are used directly as an index into the heap.
>>>>> Change this to split the bits in these int values into two parts, let's
>>>>> call them upper and lower.  For example
>>>>> xxxx xxxx xxxx yyyy yyyy yyyy yyyy yyyy
>>>>> where the xxx's are the upper bits (each x represents a hex digit), and
>>>>> the
>>>>> y's the lower bits.  The y's in this case can represent numbers up to
>>>>> million (approx), and the xxx's represent 4096 values.
>>>>> Then allocate the heap using multiple 1 meg entry tables, and store each
>>>>> one in the 4096 entry reference array.  The heap reference would be some
>>>>> bit-wise shifting and indexed lookup in addition to what we have now
>>>>> would probably be very fast, and could be optimized for the xxx=0 case
>>>>> to be
>>>>> even faster.
>>>>> This breaks heaps of over 1 meg into separate parts, which would make
>>>>> them
>>>>> more managable, I think, and keeps the high-water mark method viable,
>>>>> too.
>>>>> Opinions?
>>>>> -Marshall

View raw message