ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]
Date Sun, 16 Jun 2019 14:15:48 GMT
Hi Jeff,

>1) ...
There are several collections of filter sets here:
ctakes-gui-res\src\main\resources\org\apache\ctakes\gui\dictionary\data\

2) ...
There is additional logic within the dictionary creator code:
ctakes-gui\src\main\java\org\apache\ctakes\gui\dictionary\

I haven't gone through it in a really long time, and without doing so now I can't enumerate
the filters.  I have family visiting, otherwise my curiosity would force me to do so and get
back to you.   Honestly, it should be documented somewhere, but writing (especially technical)
is pretty much my least favorite activity.

Sean


p.s.
Please don't wait for it, but I am currently working on new dictionary code and plan to introduce
that in ctakes.  Again, please don't wait for it as it is mixed in with other work and will
not be available for several months (if at all).


________________________________________
From: Jeffrey Miller <jeffmax@gmail.com>
Sent: Sunday, June 16, 2019 9:49 AM
To: dev@ctakes.apache.org
Subject: Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge
[EXTERNAL]

Hi Sean,

Thanks for your response. I had two follow-up questions that would be very
helpful to understand if you have a few moments:

1) Are the specific filters used in the official sno_rx_16ab codified
anywhere so that I could reproduce them?

2) Do these filters explain all the changes? For example, when I use the
dictionary creator to export sno_med and rx_norm, I only get "diabetes
mellitus" where as sno_rx_16ab contains both "diabetes" and "dm".
Especially with the addition of "dm" it feels like I must be missing a step
or a setting somewhere.

Thanks!
Jeff

On Sun, Jun 16, 2019 at 8:55 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi all,
>
> The contents of the sno_rx_16ab are a dump of the umls 2016AB snomed and
> rxnorm terms with certain symantic types.  Nothing was added, but synonyms
> are filtered based upon various rules.  For instance, unnecessary suffixes
> are removed ("Wart (Finding)" -> "Wart"), really long terms are excluded
> ("can walk straight line with only minimal assistance"), terms with dose or
> form are ignored and so forth.
>
> Some filters can be changed by adding/removing from prefix/suffix/contains
> lists in plaintext files or by modifying the dictionary creator code.
>
> There was no manual curation (or nothing major).  As Remy mentioned that
> requires a lot of attention and time.  The dictionary database was not
> intended to be perfect, just as good as possible without major investment -
> and reproducible with updates to the umls.
>
> As the dictionary is released as a sql database, you should be able to add
> and remove fairly easily if sql savvy.  I have long wanted to add a "manual
> edit" panel to the dictionary gui, but haven't had the time.  If anybody
> else would like to work on such a tool that would be tonic.
>
> Sean
>
>
> ________________________________________
> From: Harish Kulkarni <harish.m.kulkarni@gmail.com>
> Sent: Saturday, June 15, 2019 5:16 PM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in dictionary built with dictionaryBuilder and
> sno_rx16ab from sourceforge [EXTERNAL]
>
> unsubscribe
>
> On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <remys@foreseemed.com>
> wrote:
>
> > Yes, I agree it would be nice because the tokenization that occurs when
> > creating the dictionaries from the releases make comparisons a bit tricky
> > and is not 100% reversible. I would love to hear an answer to your
> > quandary.
> >
> >      Remy
> >
> > On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <jeffmax@gmail.com>
> wrote:
> >
> > > Thanks, I was curious if the cTAKES devs that created the sno_rx_16ab
> > > dictionary had put the differences applied to the default UMLS output
> > into
> > > version control in some form. I imagine the
> > > additions/synonyms/abbreviations that were added manually must have
> been
> > > collected over time somewhere prior to merging them with 2016ab UMLS
> > > release? I basically want to recreate the default cTAKES 4.0.0 release
> > with
> > > an additional ontology and the latest terms. I can likely come up with
> a
> > > diff myself but was wondering if this was already maintained as part of
> > > cTAKES.
> > >
> > > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <remys@foreseemed.com
> >
> > > wrote:
> > >
> > > > Yes, that's pretty much what we do too. Not only to enhance the
> > > dictionary,
> > > > but to put in corrections because, lo and behold, there are some
> errors
> > > in
> > > > there!. As you know, an ontology is a constant curation job and that
> > > > script, under SCM, allows you to isolate those changes and, if
> > necessary,
> > > > re-apply them to new versions.
> > > >
> > > >       Remy
> > > >
> > > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <
> gandhirajan.n@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Jeff,
> > > > >
> > > > > As far as I know, maintaining a separate SQL script to add
> additional
> > > > > entries should work seamlessly.
> > > > >
> > > > > On Saturday, June 15, 2019, Jeffrey Miller <jeffmax@gmail.com>
> > wrote:
> > > > >
> > > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > > modifications/synonyms are tracked anywhere (aside from the
> > > dictionary
> > > > > > itself) so they can be carried forward in future dictionary
> > updates?
> > > > > >
> > > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> > > remys@foreseemed.com>
> > > > > > wrote:
> > > > > >
> > > > > > > From my experience, it seems pretty obvious that sno_rx_16ab
> is a
> > > > > curated
> > > > > > > dictionary based on the SNOMED 2016AB release. It does
not
> > contain
> > > > the
> > > > > > full
> > > > > > > set but it has additional edits and synonyms that are pretty
> > useful
> > > > > > > (including 'dm').
> > > > > > >
> > > > > > > We have had to manage those mods as an adjunct.
> > > > > > >
> > > > > > >       Remy
> > > > > > >
> > > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <
> > jeffmax@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > > I have created a custom dictionary from the latest
UMLS
> release
> > > > with
> > > > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems
to be
> > > generating
> > > > > > > .script
> > > > > > > > file with unexpected differences as compared to the
> sno_rx_16ab
> > > > file
> > > > > > > > available as part of the cTAKES release. Specifically,
for
> > > > diabetes,
> > > > > it
> > > > > > > is
> > > > > > > > missing these two rows:
> > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')
> > > > > > > >
> > > > > > > > and only has this one:
> > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > > > mellitus','mellitus')
> > > > > > > >
> > > > > > > > The end result is that "diabetes" is not being picked
up in
> the
> > > > test
> > > > > > > text I
> > > > > > > > am running through- it requires the full 'diabetes
mellitus'.
> > > > > > > >
> > > > > > > > Is there any setting on the UMLS install side or the
ctTAKES
> > > > > dictionary
> > > > > > > > creator that could account for missing alternative
forms like
> > > this?
> > > > > > I've
> > > > > > > > tried downloading the 2016AB release (which I think
is the
> one
> > > used
> > > > > to
> > > > > > > > create the bundled sno_rx_16ab package?) and I am
not getting
> > the
> > > > > > > alternate
> > > > > > > > forms in that dictionary either.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Jeff
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Gandhi
> > > > >
> > > > > "The best way to find urself is to lose urself in the service of
> > others
> > > > > !!!"
> > > > >
> > > >
> > >
> >
>

Mime
View raw message