uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Eckart de Castilho (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-1524) JFSIndexRepository should be enhanced with new generic methods
Date Sat, 17 Sep 2016 18:08:20 GMT

    [ https://issues.apache.org/jira/browse/UIMA-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15499438#comment-15499438
] 

Richard Eckart de Castilho commented on UIMA-1524:
--------------------------------------------------

I think we are experimenting here a bet in the range between positional-style, builder-style
(declarative), and functional style (imperative) and kind of fluctuating with respect to preferences
towards them.

Consider the below as my thinking out loudly as I try to follow Marshalls thoughts as rendered
in the wiki.

Maybe we need to pick the kinds of statements that we can build with this new API apart a
bit. Marshall actually did that quite nicely in the Gliffy diagram in the wiki. So let's try
an example...

{code}
cas.select()
    .type(Token.class)
    .within(sentence)
{code}

The above looks like builder code for an iterator or collection, but it lacks a terminal statement
like: asList(), stream(), iterator(), etc. (all of which the Gliffy provides). My understanding
of the Gliffy is that Marshall imagines that this builder-like API is not only a builder but
at the same time implements the Java stream API. That means the builder does not have to be
terminated explicitly but can be terminated at any time simply by calling any of the Stream
API methods. But for sake of clarity, I'll just add a terminal builder step now (fsIterator()).

{code}
cas.select()
    .type(Token.class) // result type of fsIterator
    .within(sentence) // location condition
    .fsIterator() // result 
{code}

Now I would argue that any such statement can only have one result type, so only one *type*
call in the builder. So a statement like the following would *not* make too much sense:

{code}
cas.select()
    .type(Token.class) // result type of fsIterator
    .type(Lemma.class) // whoops?
    .within(sentence) // location condition
    .fsIterator() // result 
{code}

So it seems to be quite sensible and economic to drop the *type* builder call and conflate
it into the *select* call:

{code}
cas.select(Token.class) // result type of fsIterator
    .within(sentence) // location condition
    .fsIterator() // result 
{code}

*Decision requirement:* Should we entirely drop the *type* call? Should we throw an exception
if it is called twice?

There are multiple ways that users want to specify types, e.g. as class, type, string, or
even nothing (i.e. not making a type restriction):

{code}
select(Token.class)
select(Token.type)
select("my.type.Token")
select()
{code}

As for location conditions (covered, covering, following, preceding, relative, between, at,
...), there are some cases where multiple conditions *could* be sensible. Note that I include
"at" in the location conditions here where the Gliffy in the wiki seems currently to consider
"at" to have a different quality from e..g "covering" or "following".

{code}
cas.select(Token.class) // result type of fsIterator
    .within(sentence) // location condition 1
    .following(predicateVerb) // location condition 2
    .fsIterator() // result 
{code}

The ability to configure some additional behaviors for the builder are sensible, e.g.:

{code}
cas.select(Token.class) // result type of fsIterator
    .within(sentence) // location condition 1
    .following(predicateVerb) // location condition 2
    .typePriorities()
    .strict()
    .fsIterator() // result 
{code}

However, if we allow multiple conditions, then the question is whether the behaviors should
apply to the whole builder only locally to individual conditions.

*Decision requirement:* We need to decide whether we want to allow multiple location conditions
(as above) or not. If not, should we throw an exception if it is called twice?

I tend towards liking the idea of multiple location conditions (although not all combinations
are sensible) if that is not too hard to implement. The code for the different select methods
in uimaFIT is very tightly tuned to particular location conditions and I am unsure how straightforward
it would be to dynamically combine them.

Normally, results are delivered in index order. It appears as if the reverse() behavior is
simply changing that to go in reverse-index order. I.e. it is a declarative reverse for which
there is also a signature that includes a boolean parameter:

{code}
cas.select(Token.class) // result type of fsIterator
    .following(predicateVerb)
    .reverse(true)
    .typePriorities()
    .strict()
    .fsIterator() // result 
{code}

Each location condition could be augmented by secondary conditions, e.g. a "displacement"
(which Marshall calls offset). E.g. here we retrieve all Tokens following the Token 3 positions
right of the predicateVerb token.

{code}
Token predicateVerb = ...
cas.select(Token.class) // result type of fsIterator
    .following(predicateVerb, 3)
    .fsIterator() // result 
{code}

The case above could also be simulated without the displacement, e.g.

{code}
Token predicateVerb = ...
cas.select(Token.class) // result type of fsIterator
    .following(predicateVerb, 3)
    .stream()
    .skip(3) // result 
{code}

... but that mightwork always. E.g. here we retrieve all Tokens following the Verb 3 positions
right of the predicateVerb Verb. So here the offset does not apply to the Token index but
to the Verb index.

{code}
Verb predicateVerb = ...
cas.select(Token.class) // result type of fsIterator
    .following(predicateVerb, 3)
    .fsIterator() // result 
{code}

But I am actually unsure as to what the semantics of the displacement are.

*Decision requirement:* When an displacement is specified in a location condition, does it
operate on the index of the selected type (here Token) or on the index of the condition type
(here Verb)?

Another afterthought on the exercise: the stream API does not work with enhanced for loop.
If the builder implements its builder API + the stream API, then it would be nice if it could
also implement the iterable API:

{code}
for (Token t : cas.select(Token.class).following(predicateVerb)) {
  // do something...
}
{code}

Omitted here are thoughts on index() and limit() which are included in the wiki description
and seem to fit in nicely with the builder API. Some aspects like unordered, nonoverlapping,
I did not consider yet.

> JFSIndexRepository should be enhanced with new generic methods
> --------------------------------------------------------------
>
>                 Key: UIMA-1524
>                 URL: https://issues.apache.org/jira/browse/UIMA-1524
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.3
>            Reporter: Joern Kottmann
>
> Existing methods should be overloaded with an additional Class argument to specify the
exact return type. This changes make down casting of returned objects unnecessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message