uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: [UIMA 3.0] Typesystem / select
Date Mon, 21 Nov 2016 21:42:14 GMT

On 11/19/2016 1:12 PM, Pablo Duboue wrote:
> (Two smaller comments, this is my last email. Have a nice weekend!)
> [UIMA 3.0] Typesystem
> Page 3
> PEAR support: multiple type definition errors. What about exact
> duplicates. For example if two pears ship the same OpenNLP JCasGen'ed
> types.
This is somewhat ambiguous, the reason being that PEARs are defined to have
their own class loading contexts, wherein their locally defined version of
classes overrides other definitions of a class higher in the hierarchy.  Note
this is opposite to normal Java classloaders, which delegate first to their
parent. The result is that any instances of class "Foo" made by Pear 1 using its
local definition of that class would get a class-cast-exception if an attempt
was made to use that instance in Pear 2 which defined, also, a "Foo" (because it
would be loaded using a different class loader).

With some careful work, some accommodation might be possible for common
well-understood use cases, though.  I think this would not be in the first
releases though.
> I don't know what "committed" means. It seems an internal detail that
> might be better introduced. The discussion regarding type system
> sharing is unclear whether this is a problem with the old system or
> the new system.
Type systems, and the low level, have a life cycle where you create a type
system "manager", and then add types and features to that type system.  When
you're finished, you "commit" the type system.  At that time, a bunch of
calculations are done to allow high performance, and the type system is "locked
down" against further modifications.  That's what commit means.  For many users,
this is all hidden by other layers being used.

Because the type system is finalized/locked down at commit time, it's possible
to discover that
the running UIMA instance already has an exact instance of that type system; if
so, that is used instead.  For large scaleouts involving 100's or more instances
of pipelines, this can amount to a significant performance improvement.
> [UIMA 3.0] Select
> I love the select mechanism. I wonder if we can have somebody comment
> on whether its use is similar to other selects (like XSL and JQuery).
> Some of the predicates seem a subset of what RuTA offers. Maybe is
> worth extending the list so as many predicates are shared with RuTA?
> That will also simplify the RuTA learning curve. (This might be better
> off discussed on the issue tracker.)
Maybe - I don't think anyone's looked at this yet.
> Besides limit and nullOk in 3.3.2 I would add filterNull.
It is very easy to add arbitrary filters, because the results of a select
implement the Java 8 stream APIs.  So, you could filter out null values using:
   ...  . filter ( fs -> fs != null ) ...

of course many other filters are similarly trivial, for instance, filter out
annotations whose span is too small (e.g. less that "epsilon" - a "final" java
int value presumably set earlier:

   ... . filter( fs -> (fs.end() - fs.begin()) > epsilon ) ...

View raw message