uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: changing edge case impl details in casCopiers
Date Fri, 01 Apr 2016 21:28:15 GMT
Hi,

I would say as long as the CasCopier doesn't simply fail if it thinks that a copy wound be
invalid/unsafe and as long as one can fix potentially broken copies afterwards, it would be
in general ok. Ok, existing code might break...

The use-case below was half hypothetical. Very real is a reverse use-case which we have implemented
in DKPro Core.

* view A contains a text
* view B is created through a transformation of the text from A
* annotations are created in view B
* annotations are copied back to view A
* offsets in the copied annotations are updated based on a reverse of the transformation operation
in the second step

The code we currently use to handle the copying back looks like this:

CasCopier copier = new CasCopier(inputCas, outputCas);

for (FeatureStructure fs : selectFS(inputCas, getType(inputCas, typeName))) {
  if (!copier.alreadyCopied(fs)) {
    FeatureStructure fsCopy = copier.copyFs(fs);
    // Make sure that the sofa annotation in the copy is set
    if (fs instanceof AnnotationBaseFS) {
      FeatureStructure sofa = fsCopy.getFeatureValue(mDestSofaFeature);
      if (sofa == null) {
        fsCopy.setFeatureValue(mDestSofaFeature, outputCas.getSofa());
      }
    }
    aOutput.addFsToIndexes(fsCopy);
  }
}

Source: https://github.com/dkpro/dkpro-core/blob/7c8785647ca8c5905aa108251935069e601cbb8d/dkpro-core-api-transform-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/api/transform/JCasTransformer_ImplBase.java#L99

I guess this code would still work and wouldn't throw exceptions or such.

If I understand the diagrams in the wiki correctly, there is one case where the sofa of the
copied FS points to the source view but the FS in indexed in the target view. This seems to
be the only difference between the case copying between CASes and within a CAS. I think it
may be better/simpler/more consistent to set the sofa of the copy to null in both cases and
if the user really wants the FS to point to a sofa in a different view, then he should set
the sofa in this was manually after the copy is complete. 

Btw... at least when copying individual FSes, the copy isn't indexed anyway by the CasCopier.
We are talking only about the bulk-copy method then?

Cheers,

-- Richard

> On 01.04.2016, at 15:57, Marshall Schor <msa@schor.com> wrote:
> 
> Hi Richard,
> 
> Thanks for this use-case.  I think there may be 2 subcases.
> 
> 1) The views, A and B, are in the same CAS, and
> 2) The views, A and B, are in different CASes
> 
> In case 1), with this new proposal the annotations copied from view A to B would
> have their "sofa" reference continue to point to the text in view A.  This means:
> 
> a) The references into the text are still "valid", but of course point to the
> text in view A.
> b) To do the updating process to have them point to the de-xml'ed version of the
> text, not only do the begin/end references need to be updated, but the sofa
> reference needs to be changed.  We could add an API to update that to the
> current view's.
> 
> In case 2), the annotations in B would no longer have a valid sofa reference at
> all (it would be set to null).
> This would clearly be a problem; but once again, we could add an API to update
> that to the current view's.
> 
> --------------------------------
> 
> So, it looks like this proposed design change would break the use-case you
> suggested. 
> 
> The current design would seems to support this use case but only if the two
> views are in different CASes.
> If they were in the same CAS, I think the current implementation (not tested,
> just reading the code) would have the copied Annotations have their sofa
> references be to the sofa in CAS A.
> 
> Does this match what you're currently seeing?
> 
> -Marshall
> 
> 
> On 3/31/2016 4:36 PM, Richard Eckart de Castilho wrote:
>> On 31.03.2016, at 21:22, Marshall Schor <msa@schor.com> wrote:
>>> I'm thinking of changing how cas copier works with respect to managing Sofas
and
>>> sofa ref updating.  I've written something up here:
>>> https://cwiki.apache.org/confluence/display/UIMA/CasCopier+and+Views
>>> 
>>> Comments / feedback / what did I overlook?  appreciated :-) -Marshall
>> Consider the following case:
>> 
>> - there are two views, A and B
>> - the text in B has been derived from A through some transformation, e.g. the removal
of XML tags
>> - A contains UIMA annotations that represent the XML tags and the point into the
text in A
>> - as part of a second transformation process, all annotations in A are to be copied
into B
>> - after the copy has been performed, the offsets of the copied annotations are updated
>> 
>> Would such a scenario still be supported after the changes you suggest?
>> 
>> Best,
>> 
>> -- Richard


Mime
View raw message