ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject RE: dictionary lookup config for best F1 measure [was RE: cTakes Annotation Comparison
Date Fri, 09 Jan 2015 22:15:27 GMT
Hi James,
Great question.  In truth, you may need to run a few times to find out.  Doing that with a
full pipeline would be tedious, but there is a descriptor in clinical-pipeline named CuisOnlyPlaintextUMLSProcessor.xml
that will only obtain Umls cuis.  It runs ~50,000 notes per hour on my laptop as-is, so I
suggest that you test with that ae.  It has lvg commented out by default (for speed).  Adding
lvg will increase the runtime, but it also will (as you know) find a few additional terms.
  You can try a few configurations without it and then the best option with it.  If you want
to test the default dictionary lookup then you can certainly swap the referenced lookup xmls.

Changes to the fast dictionary configuration are made in two places:
1.  The main descriptor ...-fast/desc/analysis_engine/UmlsLookupAnnotator.xml
2.  The resource (dictionary) configuration file resources/.../fast/cTakesHsql..xml

A few suggestions, in order of impact:
1.  I am guessing that the annotations in clef are human annotated with longest-length spans
only.  In other words, "colon cancer" instead of  "colon cancer" and "cancer".  To best approximate
this style of annotation, edit the cTakesHsql.xml in the section <rareWordConsumer>
and change the selected implementation.  By default it is DefaultTermConsumer (go figure),
but you will want to use the commented-out PrecisionTermConsumer.  As the above cTakesHsql
comment indicates " DefaultTermConsumer will persist all spans.
   PrecisionTermConsumer will only persist only the longest overlapping span of any semantic
group."  Doing this should increase precision, and depending upon how "good" the annotations
are it should not greatly change recall.

2. Just for kicks, try using SemanticCleanupTermConsumer.  It may slightly increase precision,
but it also may decrease recall.  Hopefully it doesn't do much at all (PrecisionTermConsumer
and proper semantic typing in the dictionary should suffice without this term consumer).

3. Especially for task 2 (acronyms & abbreviations), you should try a run with <name>minimumSpan</name>
in UmlsLookupAnnotator.xml set to 2.   This changes the minimum allowable span of a term.
 The default is 3 to increase precision on acronyms & abbreviations, but decreasing to
2 may improve recall on the same.   The dictionary is not built with anything below 2 characters.
4.  On that note (character length), if task 1 does not include acronyms & abbreviations,
then you can try increasing the minimum span length above 3 and see if there is a good increase
in precision without a significant decrease in recall.

5.  Try a few runs with overlapping spans in addition to exact matches.  To do this use the
OverlapJCasTermAnnotator instead of the DefaultJCasTermAnnotator annotator implementation.
 DefaultJCasTermAnnotator is specified in UmlsLookupAnnotator.xml  but I will check in a descriptor
for overlap matching.  There are additional parameters for that option, but I'll email  them
after I checkin.

6.  By default the new lookup uses Sentence as the lookup window.  I did this for two reasons:
1. Not all terms are within Noun Phrases, 2. Some Noun Phrases overlapped, causing repeated
lookups (in my 3.0 candidate trials), and 3. Not all cTakes Noun Phrases are accurate.  Because
the lookup is fast, using a full Sentence for lookup doesn't seem to hurt much.  However,
you can always switch it back to see if precision is increased enough to warrant the decrease
in recall.  This is changed in UmlsLookupAnnotator.xml

I have run my own tests with the various setups, but I don't want to adversely influence what
you run just in case the trends with the share/clef annotations differ.


-----Original Message-----
From: Masanz, James J. [mailto:Masanz.James@mayo.edu] 
Sent: Friday, January 09, 2015 3:57 PM
To: 'dev@ctakes.apache.org'
Subject: dictionary lookup config for best F1 measure [was RE: cTakes Annotation Comparison

Sean (or others), 

Of the various configuration options described below, which values/choices would you recommend
for best F1 measure for something like the shared clef 2013 task?

I'm looking for something that doesn't have to be the best speed-wise, but that is the recommended
for optimizing F1 measure.


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
Sent: Friday, December 19, 2014 11:55 AM
To: dev@ctakes.apache.org; kim.ebert@imatsolutions.com
Subject: RE: cTakes Annotation Comparison

Well, I guess that it is time for me to speak up …

I must say that I’m happy that people are showing interest in the fast lookup.  I am also
happy (sort of) that some concerns are being raised – and that there is now community participation
in my little toy.  I  have some concerns about what people are reporting.  This does not coincide
with what I have seen at all.  Yesterday I started (without knowing this thread existed) testing
a bare-minimum pipeline for CUI extraction.  It is just the stripped-down Aggregate with only:
segment, tokens, sentences, POS, and the fast lookup.  The people at Children’s wanted to
know how fast we could get.  1,196 notes in under 90 seconds on my laptop with over 210,000
annotations, which is 175/note.  After reading the thread I decided to run the fast lookup
with several configurations.  I also ran the default for 10.5 hours.  I am comparing the annotations
from each system against the human annotations that we have, and I will let everybody know
what I find – for better or worse.

The fast lookup does not (out-of-box) do the exact same thing as the default.  Some things
can be configured to make it more closely approximate the default dictionary.

1.        Set the minimum annotation span length to 2 (default is 3).  This is in desc/[ae]/UmlsLookupAnnotator.xml
: line #78.  The annotator should then pick up text like “CT” and improve recall, but
it will hurt precision.

2.       Set the Lookup Window to LookupWindowAnnotation.  This is in desc/[ae]/UmlsLookupAnnotator.xml:
lines #65 & #93.   The LookupWindowAnnotator will need to be added to the aggregate pipeline
AggregatePlaintextFastUMLSProcesor.xml  lines #50 & #172.  This will narrow the lookup
window and may increase precision, but (in my experience) reduces recall.

3.       Allow the –rough- identification of Overlapping spans.  The default dictionary
will often identify text like “metastatic colorectal carcinoma” when that text actually
does not exist anywhere in umls.  It basically ignores “colorectal” and gives the whole
span the CUI for “metastatic carcinoma”.  In this case it is arguably a good thing.  In
many others it is arguably not so much.  There is a Class ... lookup2.ae.OverlapJCasTermAnnotator.java
that will do the same thing.  You can create a new desc/[ae]/*Annotator.xml or just change
the <annotatorImplementationName> in desc/[ae]/UmlsLookupAnnotator.xml line #25.  I
will check in a new desc xml (sorry; thought I had) because there are 2 parameters unique
to OverlapJCasTermAnnotator

4.       You can play with the OverlapJCasTermAnnotator parameters “consecutiveSkips”
and “totalTokenSkips”.  These control just how lenient you want the overlap tagging to

5.       Create a new dictionary database.  There is a (bit messy) DictionaryTool in sandbox
that will let you dump whatever you do or do not want from UMLS into a database.  It will
also help you clean up or –select- stored entries as well.  There is a lot of garbage in
the default dictionary database: repeated terms with caps/no caps (“Cancer”,”cancer”),
text with metadata (“cancer [finding]”) and text that just clutters (“PhenX: entry for
cancer”, “1”, “2”).  The fast lookup database should have most of the Snomed and
RxNorm terms (and synonyms) of interest, but you could always make a new database that is
much more inclusive.

The main key to the speed of the fast dictionary lookup is actually … the key.  It is the
way that the database is indexed and the lookup by “rare” word instead of “first”
word.  Everything else can be changed around it and it should still be a faster version.

As for the false positives like “Today”, that will always be a problem until we have disambiguation.
 The lookup is basically a glorified grep.


From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu]
Sent: Friday, December 19, 2014 10:43 AM
To: dev@ctakes.apache.org; kim.ebert@imatsolutions.com
Subject: RE: cTakes Annotation Comparison

Also check out stats that Sean ran before releasing the new component on:
From the evaluation and experience, the new lookup algorithm should be a huge improvement
in terms of both speed and accuracy.
This is very different than what Bruce mentioned…  I’m sure Sean will chime here.
(The old dictionary lookup is essentially obsolete now- plagued with bugs/issues as you mentioned.)

From: Kim Ebert [mailto:kim.ebert@perfectsearchcorp.com]
Sent: Friday, December 19, 2014 10:25 AM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: Re: cTakes Annotation Comparison


I'm curious to the number of records that are in your gold standard sets, or if your gold
standard set was run through a long running cTAKES process. I know at some point we fixed
a bug in the old dictionary lookup that caused the permutations to become corrupted over time.
Typically this isn't seen in the first few records, but over time as patterns are used the
permutations would become corrupted. This caused documents that were fed through cTAKES more
than once to have less codes returned than the first time.

For example, if a permutation of 4,2,3,1 was found, the permutation would be corrupted to
be 1,2,3,4. It would no longer be possible to detect permutations of 4,2,3,1 until cTAKES
was restarted. We got the fix in after the cTAKES 3.2.0 release. https://issues.apache.org/jira/browse/CTAKES-310
Depending upon the corpus size, I could see the permutation engine eventually only have a
single permutation of 1,2,3,4.

Typically though, this isn't very easily detected in the first 100 or so documents.

We discovered this issue when we made cTAKES have consistent output of codes in our system.

[IMAT Solutions]<http://imatsolutions.com>
Kim Ebert
Software Engineer
On 12/19/2014 07:05 AM, Savova, Guergana wrote:

We are doing a similar kind of evaluation and will report the results.

Before we released the Fast lookup, we did a systematic evaluation across three gold standard
sets. We did not see the trend that Bruce reported below. The P, R and F1 results from the
old dictionary look up and the fast one were similar.

Thank you everyone!


-----Original Message-----

From: David Kincaid [mailto:kincaid.dave@gmail.com]

Sent: Friday, December 19, 2014 9:02 AM

To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>

Subject: Re: cTakes Annotation Comparison

Thanks for this, Bruce! Very interesting work. It confirms what I've seen in my small tests
that I've done in a non-systematic way. Did you happen to capture the number of false positives
yet (annotations made by cTAKES that are not in the human adjudicated standard)? I've seen
a lot of dictionary hits that are not actually entity mentions, but I haven't had a chance
to do a systematic analysis (we're working on our annotated gold standard now). One great
example is the antibiotic "Today". Every time the word today appears in any text it is annotated
as a medication mention when it almost never is being used in that sense.

These results by themselves are quite disappointing to me. Both the UMLSProcessor and especially
the FastUMLSProcessor seem to have pretty poor recall. It seems like the trade off for more
speed is a ten-fold (or more) decrease in entity recognition.

Thanks again for sharing your results with us. I think they are very useful to the project.

- Dave

On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen < bruce.tietjen@perfectsearchcorp.com<mailto:bruce.tietjen@perfectsearchcorp.com>>

Actually, we are working on a similar tool to compare it to the human

adjudicated standard for the set we tested against.  I didn't mention

it before because the tool isn't complete yet, but initial results for

the set (excluding those marked as "CUI-less") was as follows:

Human adjudicated annotations: 4591 (excluding CUI-less)

Annotations found matching the human adjudicated standard

UMLSProcessor                  2245

FastUMLSProcessor           215

 [image: IMAT Solutions] <http://imatsolutions.com><http://imatsolutions.com>
 Bruce Tietjen

Senior Software Engineer

[image: Mobile:] 801.634.1547


On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei




Thanks for this-- very useful.

Perhaps Sean Finan comment more-

but it's also probably worth it to compare to an adjudicated human

annotated gold standard.


-----Original Message-----

From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com]

Sent: Thursday, December 18, 2014 1:45 PM

To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>

Subject: cTakes Annotation Comparison

With the recent release of cTakes 3.2.1, we were very interested in

checking for any differences in annotations between using the

AggregatePlaintextUMLSProcessor pipeline and the

AggregatePlanetextFastUMLSProcessor pipeline within this release of


with its associated set of UMLS resources.

We chose to use the SHARE 14-a-b Training data that consists of 199

documents (Discharge  61, ECG 54, Echo 42 and Radiology 42) as the

basis for the comparison.

We decided to share a summary of the results with the development


Documents Processed: 199

Processing Time:

UMLSProcessor           2,439 seconds

FastUMLSProcessor    1,837 seconds

Total Annotations Reported:

UMLSProcessor                  20,365 annotations

FastUMLSProcessor             8,284 annotations

Annotation Comparisons:

Annotations common to both sets:                                  3,940

Annotations reported only by the UMLSProcessor:         16,425

Annotations reported only by the FastUMLSProcessor:    4,344

If anyone is interested, following was our test procedure:

We used the UIMA CPE to process the document set twice, once using

the AggregatePlaintextUMLSProcessor pipeline and once using the

AggregatePlaintextFastUMLSProcessor pipeline. We used the

WriteCAStoFile CAS consumer to write the results to output files.

We used a tool we recently developed to analyze and compare the

annotations generated by the two pipelines. The tool compares the

two outputs for each file and reports any differences in the

annotations (MedicationMention, SignSymptomMention,

ProcedureMention, AnatomicalSiteMention, and

DiseaseDisorderMention) between the two output sets. The tool

reports the number of 'matches' and 'misses' between each annotation set. A 'match'


defined as the presence of an identified source text interval with

its associated CUI appearing in both annotation sets. A 'miss' is

defined as the presence of an identified source text interval and

its associated CUI in one annotation set, but no matching identified

source text interval


CUI in the other. The tool also reports the total number of

annotations (source text intervals with associated CUIs) reported in

each annotation set. The compare tool is in our GitHub repository at


View raw message