uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bhavani Iyer" <bhavan...@gmail.com>
Subject Re: Delta CAS
Date Mon, 14 Jul 2008 14:26:25 GMT
OK sounds like the suggested improvements to the CAS heap design would still
preserve the high water
mark mechanism for identifying new FSs as those added after the mark.  Is
this correct ? If so, implementation can start. Should there a branch
created for this work ?

The other main concern discussed was the overhead for core UIMA use without
remoting. There should be no
measureable overhead since there will be one int compare on calls to set
feature value and add to index
and no impact on accessing FS values.

If the overhead turns out to an issue, we could still work around it with a
separate class implementing
CAS with journaling or a wrapper class as suggested before.

Bhavani

On Thu, Jul 10, 2008 at 12:57 PM, Marshall Schor <msa@schor.com> wrote:

> Thilo Goetz wrote:
>
>> Eddie Epstein wrote:
>>
>>> No opinions, but a few observations:
>>>
>>> 1M is way too big for some applications that need very small, but very
>>> many
>>> CASes.
>>>
>>
>> I agree.
>>
> How about treating the 1st 1 mb segment with the same approach as the heap
> is now - providing the ability to start small, and expanding it (by
> reallocating and copying) until it gets to 1 mb?
>
> -Marshall
>
>
>>
>>> Large arrays may be bigger than whatever segment size is chosen, making
>>> segment management a bit more complicated.
>>>
>>> There will be holes at the top of every segment when the next FS doesn't
>>> fit.
>>>
>>
>> Not necessarily.  Why couldn't you spread FSs and arrays
>> across segments?
>>
>>
>>> Eddie
>>>
>>> On Wed, Jul 9, 2008 at 2:37 PM, Marshall Schor <msa@schor.com> wrote:
>>>
>>>  Here's a suggestion suggested by previous posts, and common hardware
>>>> design
>>>> for segmented memory.
>>>>
>>>> Take the int values that represent feature structure (fs) references.
>>>>  Today, these are positive numbers from 1 (I think) to around 4 billion.
>>>>  These values are used directly as an index into the heap.
>>>>
>>>> Change this to split the bits in these int values into two parts, let's
>>>> call them upper and lower.  For example
>>>> xxxx xxxx xxxx yyyy yyyy yyyy yyyy yyyy
>>>>
>>>> where the xxx's are the upper bits (each x represents a hex digit), and
>>>> the
>>>> y's the lower bits.  The y's in this case can represent numbers up to 1
>>>> million (approx), and the xxx's represent 4096 values.
>>>>
>>>> Then allocate the heap using multiple 1 meg entry tables, and store each
>>>> one in the 4096 entry reference array.  The heap reference would be some
>>>> bit-wise shifting and indexed lookup in addition to what we have now and
>>>> would probably be very fast, and could be optimized for the xxx=0 case
>>>> to be
>>>> even faster.
>>>>
>>>> This breaks heaps of over 1 meg into separate parts, which would make
>>>> them
>>>> more managable, I think, and keeps the high-water mark method viable,
>>>> too.
>>>>
>>>> Opinions?
>>>>
>>>> -Marshall
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message