uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: Handling type conversion (was Re: "Standard" UIMA typesystem)
Date Mon, 12 Sep 2016 12:15:36 GMT

Am 12.09.2016 um 10:52 schrieb Joern Kottmann:
> I strongly disagree here I think the really static type system (and with
> JCas even compile time static) in UIMA makes it hard reuse a component,
> because I need to write explicit type system converters in many cases to be
> able to use them.

In my opinion, the static type system is one of the big advantages of
UIMA compared to GATE.
The explicit type system converter can be a performance problem, but it
is the only thing that will work for non-trivial types. Btw, a generic
converter will cause even more performance problems.
I can see your point, but I do not agree. How often does one write an
analysis engine compared to a converter? The converter is written once,
and adapted if the type system changes (won't happen so often normally).
So, I rather take the advantage of static type systems for developing
analysis engines.

> The alternative to this would be a type system which is much less static
> (or dynamic) and APIs to write AEs which can adapt well to similar but
> different user defined type systems. This could be achieved by allowing
> type system mappings, by adding explicit support for adapters in the
> framework, allowing dynamic definition of types,

Type system mapping is not that easy as it sounds, and leads exactly to
the explicit converters mentioned above. Yes, you can do that for simple
use cases, but not for complex type systems. And this is not a specific
problem of UIMA but rather a general one.

I can see that type mapping like sofa mapping for aggregate analysis
engines can be handy, but that will work only for simple use cases,
e.g., read only or for equal feature ranges. Ruta, for example, provides
also type aliasing when importing type systems.

Dynamic type systems where new types and features are incrementally
added by analysis engines can be a nice feature, but can also reduce the
maintainability of the pipelines. It would have been a nice feature for
Ruta since Ruta spams new types, but the generation of type system
descriptors during compile time works perfectly well for me now.

> Together with Thilo I wrote a paper which speaks a bit about this topic
> (see at 6.4):
> http://www.aclweb.org/anthology/W14-5209
> You have a different view and that is ok, and other people here too.

I know the paper of course and I liked it.

There is a difference to state something "is just wrong" or to complain
about JCas in general with arguments that are not accurate (in my
opinion), or to provide some arguments what can be improved in UIMA and
how it can be improved.

Different views will always be for the better of UIMA if the arguments
are constructive.

> If have a large pipeline you will end up writing two converters if you use
> an AE which can't adapt to your type system, one to convert to the AEs type
> system, this one you place before, and one to convert back from the AE type
> system to yours. I was speaking here about a simple example, and not a
> simple pipeline.

Well, I implemented the converters for the major type systems - once -,
and now I can use the analysis engines which are wrapped in an aggregate
analysis engine with the converters. This is of course not an optimal
solution, but I do not see a realistically better one. Can you provide a
better one that will work, e.g, for combining cTAKES and DKPro Core
components up to the parser level without loss of information? If yes,
I'll be the first to adapt it.



> Jörn

View raw message