ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abramowitsch, Peter" <pabramowit...@hearst.com>
Subject Re: How to use external CSV or BSV in addition to FastUMLS attention Sean [EXTERNAL]
Date Fri, 05 Jan 2018 15:23:10 GMT
Thanks, Tim Ok I did get custom dictionaries to work.

I was interested to see what it did when I overloaded an existing SNOMED
term with a new text, but keeping the same Preferred Text.  I like the

So for instance in the bsv file:
C1956346|T47|grwz|Coronary Artery Disease

grwz is now linked with the same CUI/TUI as its SNOMED cousin.

And when running it through CTAKES I see this

168": {
      "_type": "UmlsConcept",
      "codingScheme": "my-scheme",
      "score": 0.0,
      "disambiguated": false,
      "cui": "C1956346",
      "tui": "T047",
      "preferredText": "Coronary Artery Disease"

So my concept can share the same CUI as the SNOMED concept for analysis
purposes, but I know it comes from a different dictionary.  Cool.


Just a note:   Your instructions and Seans from several years ago are
slightly different from each other and from this release.

The bits that have changed are

Sean's refers to adding a bsv based dictionary to the cTakesHsql.xml file
which has become sno_rx_16ab.xml in Ctakes4

Yours refers to lookupDescriptors in the EngineDescription file.  But in
dictionary-lookup-fast, there are no more lookupDescriptors in its Engine
Description.  Those can only be found in the non-fast dictionary-lookup
module and there one finds examples of a CSV lookup which look as if
they're a different vintage from the bsv. I didn't try it, but my
take-away is that this would create a different kind of annotation. Using
the "adjunct" approach one can get bona fide disease disorder mentions and
procedure mentions etc.. based on the TUIs one hijacks.


On 1/4/18, 2:42 PM, "Miller, Timothy"
<Timothy.Miller@childrens.harvard.edu> wrote:

>The UIMA Analysis Engine descriptor for the dictionary component has a
>parameter for what ctakes calls a "lookup descriptor". By default the
>lookup descriptor describes a lookup in a hsql engine. The xml files in
>that sample directory are lookup descriptors for a lookup using the bsv
>files they point to. If you want your bsv lookup to complement the
>default lookup it's possible to just have two dictionaries running with
>different lookup descriptors. I think it's also possible to have a lookup
>descriptor have multiple lookup types (i.e. multiple <dictionary>
>sections inside <dictionaries>) but I can't guarantee that works!
>From: Abramowitsch, Peter <pabramowitsch@hearst.com>
>Sent: Thursday, January 4, 2018 7:51 AM
>To: dev@ctakes.apache.org
>Subject: Re: How to use external CSV or BSV in addition to FastUMLS
>attention Sean [EXTERNAL]
>Thanks Tim,
>I did see that folder and its contents and it seemed the right place to
>begin.  What I couldn't find was how/where to refer to one of those
>CustomCuiTui.Xml files in an engine description.
>On 1/4/18, 1:41 PM, "Miller, Timothy"
><Timothy.Miller@childrens.harvard.edu> wrote:
>>Peter, I know Sean is busy this week and he may not see this for a while.
>>But I tried this method over the summer and got it to work so I'm fairly
>>confident that's the right approach still. Some of the details may have
>>changed from two years ago, so I would also check out this directory as a
>>starting point:
>>From: Abramowitsch, Peter <pabramowitsch@hearst.com>
>>Sent: Thursday, January 4, 2018 7:28 AM
>>To: dev@ctakes.apache.org
>>Subject: Re: How to use external CSV or BSV in addition to FastUMLS
>>attention Sean [EXTERNAL]
>>Further to my previous message, Sean, I was wondering if you could tell
>>me whether this answer you gave in 2015, is still the right way to do
>>things in ctakes4.x
>>Subject:        RE: How to update cTAKES so that new top level categories
>>come out based on local
>>DD1ZdfsHVXO56wR8erA&e=>     [permalink]
>>From:   Finan, Sean (Sean...@childrens.harvard.edu)
>>Date:   Oct 6, 2015 2:04:56 pm
>>List:   org.apache.incubator.ctakes-dev
>>From: <Abramowitsch>, Peter Abramowitsch
>>Date: Thursday, January 4, 2018 at 12:50 PM
>>To: "dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>"
>>Subject: How to use external CSV or BSV in addition to FastUMLS
>>Can someone point me to any up-to-date how-tos on how to include external
>>CSV/BSV type resources to add synonyms, and other terms for dictionary
>>lookup to augment the FAST UMLS resources that comes out of the box.
>>Perhaps I have missed something, but looking at the
>>CTakesDictionaryCreator UI, it looks like it is designed only to choose
>>subsets of the UMLS data set rather than allowing one to bring in
>>completely new information sources.  I scoured the Marklogic ctakes user
>>archive, but so many of the entries are old and I'm not sure they
>>describe the current way of doing things.
>>The only approach I could see would be to take use the AggregateEngine
>>description and have it point to the CSV annotator, creating a completely
>>new AE but this would build other types of annotation, whereas what I'm
>>thinking about is a case for creating identified mentions such as a
>>DiseaseDisorderMention based on finding an acronym that the UMLS resource
>>doesn't know about, even though the concept in its full textual form is
>>I'm sure this is not a unique request and apologize in advance if it has
>>already been answered somewhere
>>- Peter

View raw message