From uima-dev-return-7704-apmail-incubator-uima-dev-archive=incubator.apache.org@incubator.apache.org Thu Jul 10 14:35:33 2008 Return-Path: Delivered-To: apmail-incubator-uima-dev-archive@locus.apache.org Received: (qmail 68642 invoked from network); 10 Jul 2008 14:35:33 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Jul 2008 14:35:33 -0000 Received: (qmail 8838 invoked by uid 500); 10 Jul 2008 14:35:05 -0000 Delivered-To: apmail-incubator-uima-dev-archive@incubator.apache.org Received: (qmail 8811 invoked by uid 500); 10 Jul 2008 14:35:05 -0000 Mailing-List: contact uima-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: uima-dev@incubator.apache.org Delivered-To: mailing list uima-dev@incubator.apache.org Received: (qmail 8796 invoked by uid 99); 10 Jul 2008 14:35:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jul 2008 07:35:05 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of twgoetz@gmx.de designates 213.165.64.20 as permitted sender) Received: from [213.165.64.20] (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 10 Jul 2008 14:34:13 +0000 Received: (qmail invoked by alias); 10 Jul 2008 14:34:34 -0000 Received: from blueice2n1.de.ibm.com (EHLO [9.152.14.84]) [195.212.29.171] by mail.gmx.net (mp019) with SMTP; 10 Jul 2008 16:34:34 +0200 X-Authenticated: #25330878 X-Provags-ID: V01U2FsdGVkX1+cK5oMofGcQ8Cm0lx6MFuM5T+UYhlpaJBzlPgodv DW6v5Z+6lW/YFb Message-ID: <48761D7F.9090808@gmx.de> Date: Thu, 10 Jul 2008 16:32:31 +0200 From: Thilo Goetz User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) MIME-Version: 1.0 To: uima-dev@incubator.apache.org Subject: Re: Delta CAS References: <2a5d14d10807080658m18904f1dvec8ecd422fadfe33@mail.gmail.com> <487451F5.30907@gmx.de> <4874BA94.1090207@gmx.de> <4874CA5D.9080502@schor.com> <4874E1DA.2030408@gmx.de> <12012a0a0807090934x715f18f7gf28e7786e845bce3@mail.gmail.com> <2787e08a0807091248r4b2bc1fau4107ab134f9ff2c@mail.gmail.com> <2a5d14d10807091559k76c34695kf328c7bcc7c04b87@mail.gmail.com> <4875D6BD.1080505@gmx.de> In-Reply-To: X-Enigmail-Version: 0.95.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-FuHaFi: 0.62 X-Virus-Checked: Checked by ClamAV on apache.org Eddie Epstein wrote: > On Thu, Jul 10, 2008 at 5:30 AM, Thilo Goetz wrote: > >> I would like to lift this discussion to a higher >> level of abstraction, as Adam is trying to. What >> are the actual requirements against the CAS? Here's >> what I think I understood. >> >> You want to be able to obtain from the CAS a marker >> object. Then you want to be able to query the CAS >> with the marker and an FS and ask if the FS was >> added before or after the marker was obtained. Is >> that right? > > > That's right, for a simple delta CAS reply from a service, but actually many > markers for journaling so that additions can be attributed to a specific > annotator. Ok, I think that shouldn't make a difference. > > Bhavani Iyer wrote: >>> If we are thinking of Delta CAS in the context of service the largest xmi >>> id >>> works. But >>> we were also using the same mechanism to support tracking CAS activity by >>> component. >>> I suppose in the second case the additional overhead of maintaining a list >>> of the FSs that >>> are added may be acceptable. >> > A requirement for delta CAS is to identity new FS and modified FS. A low > cost way to get modified FS is to add the FS-id to a simple list each time a > feature in a preexisting FS is set, then sort and remove duplicates. Being > able to identify and ignore new FS at setFeature() time allows huge > reduction in overhead because modifications are so much less frequent than > additions. So you want to do this outside the CAS? Or not? > > Yes, spreading FS across [power of 2 size] segments should eliminate holes > and more complicated bookkeeping for large arrays. > > Eddie >