uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: UIMAv3 & WebAnno
Date Fri, 05 Jan 2018 17:40:40 GMT
Hi,

> On 04.01.2018, at 22:18, Marshall Schor <msa@schor.com> wrote:
> 
> Hi Richard,
> 
> Here's one idea:  Since I thought this had been fixed a while ago, and you
> seemed (previously) to get beyond this point, I'm wondering if the build you dd
> for "trunk" somehow got mixed up levels.  I see the build is from
> 3.0.1-beta-SNAPSHOT v3 branch - but I'm guessing that's some local folder you
> have ( didn't see it in
> https://svn.apache.org/repos/asf/uima/uv3/uimaj-v3/branches)?

I built from https://svn.apache.org/repos/asf/uima/uv3/uimaj-v3/trunk
which has the version "3.0.1-beta-SNAPSHOT" in the pom.xml.

> I'm going to try to set up a test case to see if I can reproduce this.  What I'm
> planning to do is to have two type systems, T1, and T2, where T1 has a type with
> no features, and T2 has the same type with a feature.
> 
> I'll make a JCas class which has the type defined with the feature.
> 
> Then I'll create a CAS with T1, and confirm the JCas class loads and has the
> feature offset for the feature set to -1 (to cause a runtime exception if
> referenced).
> 
> Then I'll create a CAS with T2.  If this test case matches what's happening for
> you, that should trigger the exception you see.

That sounds like it should be able to reproduce the problem.

I could imagine this issue to appear in scenarios where JCas classes exist
and CASes with (slightly) different type systems are deserialized from disk
and the type system of the in-memory CAS is reinitialized from the file
stored on disk - so not necessarily a problem limited to WebAnno.

From my perspective as a UIMA v2 user, the JCas classes are a convenience that
allows for type-safe access to the CAS. But at least in v2, it seems to be 
absolutely possible to use JCas classes even with CASes that have been 
initialized with a slightly different type system, i.e. with more or less
features than the JCas class actually offers. In case there are more features,
I can always access them through the CAS API. In case there are less features,
then the getters/setters in the JCas class for these features would fail - but
only when I actually try to call them. I think at some point, UIMAv2 started to log
warnings if a JCas class had getters/setters for features that were not actually
present in the type system...

> JCas Type "de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token" implements getters
and setters for feature "morph", but the type system doesn't define that feature.


... but other than that, everything still worked fine since I didn't actually access
these features - and if they would be accessed, they would also have been added
to the type system in the respective CAS before.

So to summarize - in the UIMAv2 environment that WebAnno creates, we have:

- one set of JCas classes that usually partially overlaps with the types/features
  with which the CASes are initialized
- at any time, any number of CAS instances, potentially each with a different
  type system may exist in memory
- a particular CAS instance is (usually) accessed only from a single thread
  (this might change soon as asynchronous events are being introduced)
- for some types/features, the JCas classes are used for access, for other
  types/features the CAS API is used
- when a CAS is passed around, it is usually passed around as a JCas object
  and jcas.getCas() is called when the CAS API should be used

Now, on the other hand, one might argue that this "wild" mixing of JCas classes
with type systems deviating from the one from which the JCas classes were originally
created is a bad practice. Personally, I found it convenient because at least for
some types/features, I could use a convenient Java-like type safe access. The
alternative would be to completely stop using JCas (at least in WebAnno) and work
only via the CAS API.

I could try to do a workaround that creates a CAS at application startup with the
type system from which the JCas classes were built in order to initialize the JCas
class registry. But no idea if that would fix the issue / whether I would have to
do that on every thread that is spawned or if it would be sufficient to do it once
on any arbitrary thread... it doesn't sound like a particularly attractive solution
though.

Cheers,

-- Richard


Mime
View raw message