ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject Re: DefaultJCasTermAnnotator behavior with period and semicolon in UMLS terms [EXTERNAL]
Date Thu, 06 Feb 2020 14:13:26 GMT
Hi Jeff,

I think that sentence splitting is possibly a cause for this behavior and is worth checking.

You can get some quick debug output by adding a writer to the end of your pipeline.

add pretty.plaintext.PrettyTextWriterFit SubDirectory=POS

The SubDirectory= parameter is optional.
This writer creates a file that (in part) lists output sentence -by- sentence.  So you should
be able to see how the sentence splitter is behaving in each circumstance.

If it is the Sentence Splitter then you could try using a different lookup window in the dictionary
lookup and see if your results improve or get worse.  In the piper file, just insert above
the Dictionary lookup addition
set windowAnnotations=Section

set windowAnnotations=Paragraph
if you are using a paragraph parser.


From: Jeffrey Miller <jeffmax@gmail.com>
Sent: Wednesday, February 5, 2020 12:24 PM
To: dev@ctakes.apache.org
Subject: DefaultJCasTermAnnotator behavior with period and semicolon in UMLS terms [EXTERNAL]

* External Email - Caution *


I've noticed that if a term contains a period or a semicolon, as an
example, from the sno_rx_16ab dictionary, "antibody ; toxoplasma", that
this will not be found if the semicolon is attached to the first word, but
will be found if it is either "antibody ; toxoplasma" or "antibody
;toxoplasma". There is similar behavior with a period in the same place. My
first instinct was that this had to do with the sentence splitter and
sentences being the default lookup window. I found an older discussion
about this in reference to periods in genes, but it was from a while back.
Just curious if anyone has dealt with this issue.


View raw message