uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: UIMAv3 & WebAnno
Date Thu, 18 Jan 2018 16:47:05 GMT
Next problem:  The type in webAnno v3:
de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity
has features
  sofa
  begin
  end
  value
  identifier  <<< Surprise! not in the "reference" compare files.

It looks like "identifier" was added for v3?

It makes tests which compare tsv files where the reference doesn't have it, fail.

The miscompare:
  actual:
#FORMAT=WebAnno TSV 3.2
#T_SP=de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity|value|identifier

  expected:
#FORMAT=WebAnno TSV 3.2
#T_SP=de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity|value

  You can see the "identifier" is missing from the "reference" file.

How would you recommend fixing this? 
 - update the reference file to include the indentifier (might be many other
cascading changes needed?)
 - remove the "identifier" feature from the
de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity ?
    (but probably will break something, assuming it was added for a reason! :-) )

-Marshall


On 1/18/2018 10:50 AM, Marshall Schor wrote:
> Fixed this problem. 
>
> It was due to many places in webAnno where the code fragment:
>
> feature.toString()  was expected to return the fully-qualified feature name.
>
> But v3 augmented the feature "toString" to provide more information about a feature.
>
> Fix was to change all occurrances of feature.toString() to feature.getName().
>
> Now the webanno-io-tsv first test succeeds, although it still is reporting lots
> of updating of Annotation "end" values while the annotation is in the index
> (UIMA is recovering these, one at a time).
>
> on to the next problem...
>
>
> On 1/17/2018 4:52 PM, Marshall Schor wrote:
>> I changed the testcase for WebAnnoTsv2ReaderWriterTest to turn off the
>> exception, to move on to the next issue :-)
>>
>> Next issue: the first runPipeLine() in that same test now fails, saying:
>>
>> Caused by: java.io.IOException: Target file
>> [target\test-output\WebAnnoTsv2ReaderWriterTest-test\example2.tsv] already
>> exists and overwriting not enabled.
>>     at
>> de.tudarmstadt.ukp.dkpro.core.api.io.JCasFileWriter_ImplBase.getOutputStream(JCasFileWriter_ImplBase.java:230)
>>     at
>> de.tudarmstadt.ukp.dkpro.core.api.io.JCasFileWriter_ImplBase.getOutputStream(JCasFileWriter_ImplBase.java:155)
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Writer.process(WebannoTsv2Writer.java:101)
>>     ... 38 more
>>
>> I got around that by erasing the target/ directory, then doing a
>> maven-update-project to cause an Eclispe rebuild of the project. Now when I run
>> it I get a beyond the above error.  The next error is:
>>
>> java.io.IOException: example2.tsv This is not a valid TSV File. check this line:
>> 1-1    Ms.    Sofa
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.setAnnotations(WebannoTsv2Reader.java:159)
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.convertToCas(WebannoTsv2Reader.java:78)
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.getNext(WebannoTsv2Reader.java:547)
>>     at
>> de.tudarmstadt.ukp.dkpro.core.api.io.JCasResourceCollectionReader_ImplBase.getNext(JCasResourceCollectionReader_ImplBase.java:36)
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebAnnoTsv2ReaderWriterTest.test(WebAnnoTsv2ReaderWriterTest.java:81)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:498)
>>     at
>> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>>     at
>> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>>     at
>> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>>     at
>> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>>     at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>>     at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>>     at
>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>>     at
>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>>     at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>>     at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>>     at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>>     at
>> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
>>     at
>> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>>     at
>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:538)
>>     at
>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:760)
>>     at
>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:460)
>>     at
>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:206)
>>
>> The file in question has these as its first few lines:
>>
>>  # de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity | sofa | begin | end |
>> value | identifier # de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS |
>> sofa | begin | end | PosValue | coarseValue #
>> de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.Dependency | sofa |
>> begin | end | DependencyType | flavor |
>> AttachTo=de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS
>> #id=1
>> #text=Ms. Haag plays Elianti .
>> 1-1    Ms.    Sofa
>>    sofaNum: 1
>>    sofaID: "_InitialView"
>>    mimeType: "text"
>>    sofaArray: <null>
>>    sofaString: "Ms. Haag plays Elianti .
>> Rolls-Royce Motor Cars Inc. said it expects its U.S. sa..."
>>    sofaURI: <null>    0    3    B-PER   
>> B-de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity_    NNP   
>> de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS_    Sofa
>>    sofaNum: 1
>>    sofaID: "_InitialView"
>>    mimeType: "text"
>>    sofaArray: <null>
>>    sofaString: "Ms. Haag plays Elianti .
>> Rolls-Royce Motor Cars Inc. said it expects its U.S. sa..."
>>    sofaURI: <null>    0    14    SUBJ   
>> de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.Dependency_    1-3   
>> 1-2    Haag    Sofa
>>    sofaNum: 1
>>    sofaID: "_InitialView"
>>    mimeType: "text"
>>    sofaArray: <null>
>>    sofaString: "Ms. Haag plays Elianti .
>> Rolls-Royce Motor Cars Inc. said it expects its U.S. sa..."
>>    sofaURI: <null>    4    8    I-PER   
>> I-de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity_    NNP   
>> de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS_    Sofa
>>    sofaNum: 1
>>    sofaID: "_InitialView"
>>    mimeType: "text"
>>    sofaArray: <null>
>>    sofaString: "Ms. Haag plays Elianti .
>> Rolls-Royce Motor Cars Inc. said it expects its U.S. sa..."
>>    sofaURI: <null>    4    14    SBJ   
>> de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.Dependency_    1-3   
>>
>> ( wonder about the sofa string ending in "...")
>>
>> -Marshall
>>
>>
>> On 1/17/2018 4:12 PM, Marshall Schor wrote:
>>> I put in an exclude for the slf4j-log4j12, and went to the next issue:
>>>
>>> Tests in webanno-io-tsv fail.  The first one is failing here:
>>> WebAnnoTsv2ReaderWriterTest, line  65 (runPipeLine(reader, writer).
>>>
>>> It fails because it's updating an "end" value for an annotation that's already
>>> in the index, causing the message which follows.
>>> UIMA normally recovers from these things, but a global flag was configured:
>>> "uima.exception_when_fs_update_corrupts_index".
>>>
>>> System.getProperty("uima.exception_when_fs_update_corrupts_index")
>>>      (java.lang.String) true
>>>
>>> I can't see where this is being set, though.  Any ideas?  Is the updating of
the
>>> annotation:end while the item is indexed, the way it is designed to work?
>>>
>>> -Marshall
>>> === test =====================
>>> 2018-01-17 15:52:35 INFO WebannoTsv2Reader - Scanning
>>> [file:/C:/au/gitClones/webanno/webanno-io-tsv/src/test/resources/tsv2/]
>>> 2018-01-17 15:52:35 INFO WebannoTsv2Reader - Found [1] resources to be read
>>> 2018-01-17 15:54:31 INFO WebannoTsv2Reader - 0 of 1:
>>> file:/C:/au/gitClones/webanno/webanno-io-tsv/src/test/resources/tsv2/example2.tsv
>>> 2018-01-17 15:54:31 WARN uima - While FS was in the index, the feature
>>> "uima.tcas.Annotation:end", which is used as a key in one or more indexes, was
>>> modified
>>>  FS = "NamedEntity
>>>    sofa: _InitialView
>>>    begin: 0
>>>    end: 3
>>>    value: "PER"
>>>    identifier: <null>"
>>> java.lang.Throwable
>>>     at org.apache.uima.cas.impl.CASImpl.featModWhileInIndexReport(CASImpl.java:2985)
>>>     at org.apache.uima.cas.impl.CASImpl.featModWhileInIndexReport(CASImpl.java:2977)
>>>     at
>>> org.apache.uima.cas.impl.CASImpl.checkForInvalidFeatureSetting(CASImpl.java:2865)
>>>     at org.apache.uima.cas.impl.CASImpl.setWithCheckAndJournal(CASImpl.java:1828)
>>>     at
>>> org.apache.uima.cas.impl.FeatureStructureImplC._setIntValueNfcCJ(FeatureStructureImplC.java:684)
>>>     at
>>> org.apache.uima.cas.impl.FeatureStructureImplC._setIntValueNfc(FeatureStructureImplC.java:460)
>>>     at org.apache.uima.jcas.tcas.Annotation.setEnd(Annotation.java:123)
>>>     at
>>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.createSpanAnnotation(WebannoTsv2Reader.java:506)
>>>     at
>>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.setAnnotations(WebannoTsv2Reader.java:176)
>>>     at
>>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.convertToCas(WebannoTsv2Reader.java:78)
>>>     at
>>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.getNext(WebannoTsv2Reader.java:547)
>>>     at
>>> de.tudarmstadt.ukp.dkpro.core.api.io.JCasResourceCollectionReader_ImplBase.getNext(JCasResourceCollectionReader_ImplBase.java:36)
>>>     at
>>> org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:100)
>>>     at
>>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebAnnoTsv2ReaderWriterTest.test(WebAnnoTsv2ReaderWriterTest.java:65)
>>>
>


Mime
View raw message