uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <eck...@ukp.informatik.tu-darmstadt.de>
Subject Re: [jira] [Created] (UIMA-2419) Initial view for sofa unaware components not automatically created
Date Sun, 10 Jun 2012 17:50:46 GMT
Am 10.06.2012 um 19:03 schrieb Marshall Schor:

> Hmmm,  it seems to me that something is wrong if a UIMA pipeline ended up sending a CAS
to a sofa-unaware component without a default view having been set up.  I would guess that
in this situation, it would be better to throw an exception rather than hide this by automatically
creating the view.   If a missing view is created, its subject-of-analysis would be left unset?
 I'm guessing that most sofa-unaware annotators would not expect that, and would fail in mysterious
> What would be the use cases where it would be more valuable to create the view, rather
than signal something's amiss?

My use-case is an aggregate analysis engine that uses a CollectionReader as its first component
(a CasMultiplier may also work, I didn't test that). UIMA doesn't support sofa mappings for
readers other than in CPEs (or I missed something). We would like to add support for sofa-mapped
readers in uimaFIT though and would like to do so implementing as little infrastructure as
possible on top of UIMA. Ideally, we'd just cleverly configure UIMA to get the feature implemented.

So, to work around that fact that CollectionReaderDescriptions do not support sofa mappings,
I configured an AnalysisEngineDescription for a CollectionReader. UIMA internally doesn't
really care much which kind of processing component is declared in an AnalysisEngineDescription,
because internally it is all handled the same. I dimly remember a post to one of the UIMA
mailing lists saying that the distinction between readers, analysis engines and consumers
is largely arbitrary and that everything could be done with CasMultipliers as well.

So when I run the aggregate, the collection reader tries to write data to some mapped sofa,
but the sofa does not yet exist. The reader is not sofa-aware, so it shouldn't have to create
its initial view itself. If I use a sofa-unaware CasMultiplier instead, I suppose the same
thing will happen. The reader/CasMultiplier would set the sofa of course, but since it is
sofa-unaware, it wouldn't create the view.

I guess another option should be to change CollectionReaderAdapter to create any missing initial
view for sofa-unaware readers. That would not have any side other component type and it would
solve the problem for my use-case as well. The problem is, that doesn't work, because the
PrimitiveAnalysisEngine_impl.classAnalysisComponentProcess() already tries to access the mapped
view and fails. Changing that to test if the mAnalysisComponent is a sofa-unaware CollectionReaderAdapter
and creating a new view only in that case looks rather like a hack to me, although it would
probably resolve the situation. I didn't test that yet, but if you think it reasonable, I
can check it.

Actually, thinking about it, I wonder if missing views should not be created on the first
request in general. I have several times seen people use some helper methods that try to get
a view and if an exception is thrown create the view and return it.

Or maybe it'd make sense to simply add the possibility to declare sofa mappings to the CollectionReaderDescription.

-- Richard

Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab (UKP-TUD) 
FB 20 Computer Science Department      
Technische Universit├Ąt Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de

View raw message