ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]
Date Sun, 16 Jun 2019 12:54:54 GMT
Hi all,

The contents of the sno_rx_16ab are a dump of the umls 2016AB snomed and rxnorm terms with
certain symantic types.  Nothing was added, but synonyms are filtered based upon various rules.
 For instance, unnecessary suffixes are removed ("Wart (Finding)" -> "Wart"), really long
terms are excluded ("can walk straight line with only minimal assistance"), terms with dose
or form are ignored and so forth.

Some filters can be changed by adding/removing from prefix/suffix/contains lists in plaintext
files or by modifying the dictionary creator code.

There was no manual curation (or nothing major).  As Remy mentioned that requires a lot of
attention and time.  The dictionary database was not intended to be perfect, just as good
as possible without major investment - and reproducible with updates to the umls.

As the dictionary is released as a sql database, you should be able to add and remove fairly
easily if sql savvy.  I have long wanted to add a "manual edit" panel to the dictionary gui,
but haven't had the time.  If anybody else would like to work on such a tool that would be
tonic.

Sean


________________________________________
From: Harish Kulkarni <harish.m.kulkarni@gmail.com>
Sent: Saturday, June 15, 2019 5:16 PM
To: dev@ctakes.apache.org
Subject: Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge
[EXTERNAL]

unsubscribe

On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <remys@foreseemed.com>
wrote:

> Yes, I agree it would be nice because the tokenization that occurs when
> creating the dictionaries from the releases make comparisons a bit tricky
> and is not 100% reversible. I would love to hear an answer to your
> quandary.
>
>      Remy
>
> On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <jeffmax@gmail.com> wrote:
>
> > Thanks, I was curious if the cTAKES devs that created the sno_rx_16ab
> > dictionary had put the differences applied to the default UMLS output
> into
> > version control in some form. I imagine the
> > additions/synonyms/abbreviations that were added manually must have been
> > collected over time somewhere prior to merging them with 2016ab UMLS
> > release? I basically want to recreate the default cTAKES 4.0.0 release
> with
> > an additional ontology and the latest terms. I can likely come up with a
> > diff myself but was wondering if this was already maintained as part of
> > cTAKES.
> >
> > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <remys@foreseemed.com>
> > wrote:
> >
> > > Yes, that's pretty much what we do too. Not only to enhance the
> > dictionary,
> > > but to put in corrections because, lo and behold, there are some errors
> > in
> > > there!. As you know, an ontology is a constant curation job and that
> > > script, under SCM, allows you to isolate those changes and, if
> necessary,
> > > re-apply them to new versions.
> > >
> > >       Remy
> > >
> > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <gandhirajan.n@gmail.com>
> > > wrote:
> > >
> > > > Hi Jeff,
> > > >
> > > > As far as I know, maintaining a separate SQL script to add additional
> > > > entries should work seamlessly.
> > > >
> > > > On Saturday, June 15, 2019, Jeffrey Miller <jeffmax@gmail.com>
> wrote:
> > > >
> > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > modifications/synonyms are tracked anywhere (aside from the
> > dictionary
> > > > > itself) so they can be carried forward in future dictionary
> updates?
> > > > >
> > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> > remys@foreseemed.com>
> > > > > wrote:
> > > > >
> > > > > > From my experience, it seems pretty obvious that sno_rx_16ab
is a
> > > > curated
> > > > > > dictionary based on the SNOMED 2016AB release. It does not
> contain
> > > the
> > > > > full
> > > > > > set but it has additional edits and synonyms that are pretty
> useful
> > > > > > (including 'dm').
> > > > > >
> > > > > > We have had to manage those mods as an adjunct.
> > > > > >
> > > > > >       Remy
> > > > > >
> > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <
> jeffmax@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > I have created a custom dictionary from the latest UMLS
release
> > > with
> > > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to be
> > generating
> > > > > > .script
> > > > > > > file with unexpected differences as compared to the sno_rx_16ab
> > > file
> > > > > > > available as part of the cTAKES release. Specifically,
for
> > > diabetes,
> > > > it
> > > > > > is
> > > > > > > missing these two rows:
> > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')
> > > > > > >
> > > > > > > and only has this one:
> > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > > mellitus','mellitus')
> > > > > > >
> > > > > > > The end result is that "diabetes" is not being picked up
in the
> > > test
> > > > > > text I
> > > > > > > am running through- it requires the full 'diabetes mellitus'.
> > > > > > >
> > > > > > > Is there any setting on the UMLS install side or the ctTAKES
> > > > dictionary
> > > > > > > creator that could account for missing alternative forms
like
> > this?
> > > > > I've
> > > > > > > tried downloading the 2016AB release (which I think is
the one
> > used
> > > > to
> > > > > > > create the bundled sno_rx_16ab package?) and I am not getting
> the
> > > > > > alternate
> > > > > > > forms in that dictionary either.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jeff
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Gandhi
> > > >
> > > > "The best way to find urself is to lose urself in the service of
> others
> > > > !!!"
> > > >
> > >
> >
>

Mime
View raw message