ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeffrey Miller <jeff...@gmail.com>
Subject Re: DefaultJCasTermAnnotator behavior with period and semicolon in UMLS terms [EXTERNAL]
Date Thu, 06 Feb 2020 15:16:39 GMT
Sean,

Thanks for the detailed answer- I will take a look and update this thread
if I find out the cause.

Jeff

On Thu, Feb 6, 2020 at 9:13 AM Finan, Sean <Sean.Finan@childrens.harvard.edu>
wrote:

> Hi Jeff,
>
> I think that sentence splitting is possibly a cause for this behavior and
> is worth checking.
>
> You can get some quick debug output by adding a writer to the end of your
> pipeline.
>
> add pretty.plaintext.PrettyTextWriterFit SubDirectory=POS
>
> The SubDirectory= parameter is optional.
> This writer creates a file that (in part) lists output sentence -by-
> sentence.  So you should be able to see how the sentence splitter is
> behaving in each circumstance.
>
> If it is the Sentence Splitter then you could try using a different lookup
> window in the dictionary lookup and see if your results improve or get
> worse.  In the piper file, just insert above the Dictionary lookup addition
>
> set windowAnnotations=Section
>
> or
> set windowAnnotations=Paragraph
> if you are using a paragraph parser.
>
> Sean
>
>
> ________________________________________
> From: Jeffrey Miller <jeffmax@gmail.com>
> Sent: Wednesday, February 5, 2020 12:24 PM
> To: dev@ctakes.apache.org
> Subject: DefaultJCasTermAnnotator behavior with period and semicolon in
> UMLS terms [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi,
>
> I've noticed that if a term contains a period or a semicolon, as an
> example, from the sno_rx_16ab dictionary, "antibody ; toxoplasma", that
> this will not be found if the semicolon is attached to the first word, but
> will be found if it is either "antibody ; toxoplasma" or "antibody
> ;toxoplasma". There is similar behavior with a period in the same place. My
> first instinct was that this had to do with the sentence splitter and
> sentences being the default lookup window. I found an older discussion
> about this in reference to periods in genes, but it was from a while back.
> Just curious if anyone has dealt with this issue.
>
> Thanks,
> Jeff
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message