uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Lally" <ala...@alum.rpi.edu>
Subject Re: Delta CAS
Date Tue, 08 Jul 2008 22:26:23 GMT
On Tue, Jul 8, 2008 at 6:08 PM, Marshall Schor <msa@schor.com> wrote:
>> The CAS heap and
>> in particular the way it grows is a major performance bottleneck
>> for large documents.  If we have other parts of UIMA depend on
>> the (bad) implementation details now, we'll never be able to
>> improve on the design.
> Hmmm, I guess I was thinking that if we wanted to change this in the future,
> we could.  I agree it would be more difficult;  we've changed things like
> this before using the refactoring tools that let you see pretty clearly
> various dependencies...

I think what's happening here is that XmiCasSerializer has a
requirement on the CAS that it can efficiently know which FS are new
(created since some mark - when the CAS was received at a service in
this case).  Currently this is doable using the FS addresses, but if
that were no longer the case, we could build a different marking
mechanism that does the same thing.  So I think I'm agreeing with
Marshall there.

That said, if checking an FS against a set of known FS's that were
input to the service isn't a significant performance hit, then maybe
that is a more flexible way to go, which would avoid having a problem
with this in the future.  (That set already exists - it is built by
the deserializer and used to ensure consistent ids.)

Hm - I seem to have added one more voice to the chorus of people
recently requesting seeing some performance numbers before making a
design decision. ;)


View raw message