ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abramowitsch, Peter" <pabramowit...@hearst.com>
Subject Re: How to use external CSV or BSV in addition to FastUMLS attention Sean [EXTERNAL]
Date Fri, 05 Jan 2018 15:23:10 GMT
Thanks, Tim Ok I did get custom dictionaries to work.

I was interested to see what it did when I overloaded an existing SNOMED
term with a new text, but keeping the same Preferred Text.  I like the
results:

So for instance in the bsv file:
C1956346|T47|grwz|Coronary Artery Disease

grwz is now linked with the same CUI/TUI as its SNOMED cousin.


And when running it through CTAKES I see this

168": {
      "_type": "UmlsConcept",
      "codingScheme": "my-scheme",
      "score": 0.0,
      "disambiguated": false,
      "cui": "C1956346",
      "tui": "T047",
      "preferredText": "Coronary Artery Disease"
}



So my concept can share the same CUI as the SNOMED concept for analysis
purposes, but I know it comes from a different dictionary.  Cool.

--------

Just a note:   Your instructions and Seans from several years ago are
slightly different from each other and from this release.

The bits that have changed are

Sean's refers to adding a bsv based dictionary to the cTakesHsql.xml file
which has become sno_rx_16ab.xml in Ctakes4

Yours refers to lookupDescriptors in the EngineDescription file.  But in
dictionary-lookup-fast, there are no more lookupDescriptors in its Engine
Description.  Those can only be found in the non-fast dictionary-lookup
module and there one finds examples of a CSV lookup which look as if
they're a different vintage from the bsv. I didn't try it, but my
take-away is that this would create a different kind of annotation. Using
the "adjunct" approach one can get bona fide disease disorder mentions and
procedure mentions etc.. based on the TUIs one hijacks.

Peter


On 1/4/18, 2:42 PM, "Miller, Timothy"
<Timothy.Miller@childrens.harvard.edu> wrote:

>The UIMA Analysis Engine descriptor for the dictionary component has a
>parameter for what ctakes calls a "lookup descriptor". By default the
>lookup descriptor describes a lookup in a hsql engine. The xml files in
>that sample directory are lookup descriptors for a lookup using the bsv
>files they point to. If you want your bsv lookup to complement the
>default lookup it's possible to just have two dictionaries running with
>different lookup descriptors. I think it's also possible to have a lookup
>descriptor have multiple lookup types (i.e. multiple <dictionary>
>sections inside <dictionaries>) but I can't guarantee that works!
>Tim
>
>________________________________________
>From: Abramowitsch, Peter <pabramowitsch@hearst.com>
>Sent: Thursday, January 4, 2018 7:51 AM
>To: dev@ctakes.apache.org
>Subject: Re: How to use external CSV or BSV in addition to FastUMLS
>attention Sean [EXTERNAL]
>
>Thanks Tim,
>
>I did see that folder and its contents and it seemed the right place to
>begin.  What I couldn't find was how/where to refer to one of those
>CustomCuiTui.Xml files in an engine description.
>
>Peter
>
>On 1/4/18, 1:41 PM, "Miller, Timothy"
><Timothy.Miller@childrens.harvard.edu> wrote:
>
>>Peter, I know Sean is busy this week and he may not see this for a while.
>>But I tried this method over the summer and got it to work so I'm fairly
>>confident that's the right approach still. Some of the details may have
>>changed from two years ago, so I would also check out this directory as a
>>starting point:
>>https://urldefense.proofpoint.com/v2/url?u=http-3A__svn.apache.org_viewvc
>>_
>>ctakes_trunk_ctakes-2Ddictionary-2Dlookup-2Dfast-2Dres_src_main_resources
>>_
>>org_apache_ctakes_dictionary_lookup_fast_example_bsv_&d=DwIFAw&c=B73tqXN8
>>E
>>c0ocRmZHMCntw&r=5LM1YwNyMUq7CWiSepCCsjTjwuVF4uswNF8BK5Orm10&m=j2h_timB4sk
>>c
>>lRz6ICf0XlmaUgJekZOOgGo_WF-iuDw&s=qbZInrnxDgeP2prW-pOoOFkVLFweja-ct48H8NW
>>y
>>dIM&e=
>>
>>Tim
>>
>>________________________________________
>>From: Abramowitsch, Peter <pabramowitsch@hearst.com>
>>Sent: Thursday, January 4, 2018 7:28 AM
>>To: dev@ctakes.apache.org
>>Subject: Re: How to use external CSV or BSV in addition to FastUMLS
>>attention Sean [EXTERNAL]
>>
>>Further to my previous message, Sean, I was wondering if you could tell
>>me whether this answer you gave in 2015, is still the right way to do
>>things in ctakes4.x
>>
>>permalink:
>>https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message_
>>s
>>3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
>>H
>>eup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Xq7U7BT
>>l
>>hofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxsDD1ZdfsHVXO
>>5
>>6wR8erA&e=
>>
>>Subject:        RE: How to update cTAKES so that new top level categories
>>come out based on local
>>dictionary?<https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.
>>o
>>rg_message_s3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdi
>>o
>>CoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx667
>>4
>>h&m=Xq7U7BTlhofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBx
>>s
>>DD1ZdfsHVXO56wR8erA&e=>     [permalink]
>><https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message
>>_
>>s3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r
>>=
>>Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Xq7U7B
>>T
>>lhofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxsDD1ZdfsHVX
>>O
>>56wR8erA&e=>
>>From:   Finan, Sean (Sean...@childrens.harvard.edu)
>>Date:   Oct 6, 2015 2:04:56 pm
>>List:   org.apache.incubator.ctakes-dev
>>
>>
>>Regards
>>Peter
>>
>>From: <Abramowitsch>, Peter Abramowitsch
>><pabramowitsch@hearst.com<mailto:pabramowitsch@hearst.com>>
>>Date: Thursday, January 4, 2018 at 12:50 PM
>>To: "dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>"
>><dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>>
>>Subject: How to use external CSV or BSV in addition to FastUMLS
>>
>>Can someone point me to any up-to-date how-tos on how to include external
>>CSV/BSV type resources to add synonyms, and other terms for dictionary
>>lookup to augment the FAST UMLS resources that comes out of the box.
>>Perhaps I have missed something, but looking at the
>>CTakesDictionaryCreator UI, it looks like it is designed only to choose
>>subsets of the UMLS data set rather than allowing one to bring in
>>completely new information sources.  I scoured the Marklogic ctakes user
>>archive, but so many of the entries are old and I'm not sure they
>>describe the current way of doing things.
>>
>>The only approach I could see would be to take use the AggregateEngine
>>description and have it point to the CSV annotator, creating a completely
>>new AE but this would build other types of annotation, whereas what I'm
>>thinking about is a case for creating identified mentions such as a
>>DiseaseDisorderMention based on finding an acronym that the UMLS resource
>>doesn't know about, even though the concept in its full textual form is
>>there.
>>
>>I'm sure this is not a unique request and apologize in advance if it has
>>already been answered somewhere
>>
>>- Peter
>


Mime
View raw message