OK I agree. What's required is something like the following. /** sets the high water mark and returns the marker object. */ Marker getHighWaterMark(); /** default to false (disabled) and enabled when high water mark is set via above api. */ boolean isDeltaCasJournalingEnabled(); public interface Marker { boolean isAboveHighWaterMark(FeatureStructure fs); } The overhead is the call to isDeltaCasJournalingEnabled() On Mon, Jul 14, 2008 at 10:40 AM, Thilo Goetz wrote: > Bhavani Iyer wrote: > >> OK sounds like the suggested improvements to the CAS heap design would >> still >> preserve the high water >> mark mechanism for identifying new FSs as those added after the mark. Is >> this correct ? >> > > No. My conclusion was that we'll create a CAS API that returns > a returns a marker object which may later be used to query the > CAS about certain FSs and when they were created. This object > will be opaque to CAS users and transient in nature. Please feel > free to make a suggestion for such an API to make sure your > requirements are covered. > > > If so, implementation can start. Should there a branch > >> created for this work ? >> > > I don't see why we need a branch for this. > > >> The other main concern discussed was the overhead for core UIMA use >> without >> remoting. There should be no >> measureable overhead since there will be one int compare on calls to set >> feature value and add to index >> and no impact on accessing FS values. >> > > Please explain your design. I expect that there'll be a > global setting, so at most a boolean is checked? > > > >> If the overhead turns out to an issue, we could still work around it with >> a >> separate class implementing >> CAS with journaling or a wrapper class as suggested before. >> >> Bhavani >> >> On Thu, Jul 10, 2008 at 12:57 PM, Marshall Schor wrote: >> >> Thilo Goetz wrote: >>> >>> Eddie Epstein wrote: >>>> >>>> No opinions, but a few observations: >>>>> >>>>> 1M is way too big for some applications that need very small, but very >>>>> many >>>>> CASes. >>>>> >>>>> I agree. >>>> >>>> How about treating the 1st 1 mb segment with the same approach as the >>> heap >>> is now - providing the ability to start small, and expanding it (by >>> reallocating and copying) until it gets to 1 mb? >>> >>> -Marshall >>> >>> >>> Large arrays may be bigger than whatever segment size is chosen, making >>>>> segment management a bit more complicated. >>>>> >>>>> There will be holes at the top of every segment when the next FS >>>>> doesn't >>>>> fit. >>>>> >>>>> Not necessarily. Why couldn't you spread FSs and arrays >>>> across segments? >>>> >>>> >>>> Eddie >>>>> >>>>> On Wed, Jul 9, 2008 at 2:37 PM, Marshall Schor wrote: >>>>> >>>>> Here's a suggestion suggested by previous posts, and common hardware >>>>> >>>>>> design >>>>>> for segmented memory. >>>>>> >>>>>> Take the int values that represent feature structure (fs) references. >>>>>> Today, these are positive numbers from 1 (I think) to around 4 >>>>>> billion. >>>>>> These values are used directly as an index into the heap. >>>>>> >>>>>> Change this to split the bits in these int values into two parts, >>>>>> let's >>>>>> call them upper and lower. For example >>>>>> xxxx xxxx xxxx yyyy yyyy yyyy yyyy yyyy >>>>>> >>>>>> where the xxx's are the upper bits (each x represents a hex digit), >>>>>> and >>>>>> the >>>>>> y's the lower bits. The y's in this case can represent numbers up to >>>>>> 1 >>>>>> million (approx), and the xxx's represent 4096 values. >>>>>> >>>>>> Then allocate the heap using multiple 1 meg entry tables, and store >>>>>> each >>>>>> one in the 4096 entry reference array. The heap reference would be >>>>>> some >>>>>> bit-wise shifting and indexed lookup in addition to what we have now >>>>>> and >>>>>> would probably be very fast, and could be optimized for the xxx=0 case >>>>>> to be >>>>>> even faster. >>>>>> >>>>>> This breaks heaps of over 1 meg into separate parts, which would make >>>>>> them >>>>>> more managable, I think, and keeps the high-water mark method viable, >>>>>> too. >>>>>> >>>>>> Opinions? >>>>>> >>>>>> -Marshall >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>