ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeffrey Miller <jeff...@gmail.com>
Subject Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]
Date Sun, 16 Jun 2019 13:49:26 GMT
Hi Sean,

Thanks for your response. I had two follow-up questions that would be very
helpful to understand if you have a few moments:

1) Are the specific filters used in the official sno_rx_16ab codified
anywhere so that I could reproduce them?

2) Do these filters explain all the changes? For example, when I use the
dictionary creator to export sno_med and rx_norm, I only get "diabetes
mellitus" where as sno_rx_16ab contains both "diabetes" and "dm".
Especially with the addition of "dm" it feels like I must be missing a step
or a setting somewhere.

Thanks!
Jeff

On Sun, Jun 16, 2019 at 8:55 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi all,
>
> The contents of the sno_rx_16ab are a dump of the umls 2016AB snomed and
> rxnorm terms with certain symantic types.  Nothing was added, but synonyms
> are filtered based upon various rules.  For instance, unnecessary suffixes
> are removed ("Wart (Finding)" -> "Wart"), really long terms are excluded
> ("can walk straight line with only minimal assistance"), terms with dose or
> form are ignored and so forth.
>
> Some filters can be changed by adding/removing from prefix/suffix/contains
> lists in plaintext files or by modifying the dictionary creator code.
>
> There was no manual curation (or nothing major).  As Remy mentioned that
> requires a lot of attention and time.  The dictionary database was not
> intended to be perfect, just as good as possible without major investment -
> and reproducible with updates to the umls.
>
> As the dictionary is released as a sql database, you should be able to add
> and remove fairly easily if sql savvy.  I have long wanted to add a "manual
> edit" panel to the dictionary gui, but haven't had the time.  If anybody
> else would like to work on such a tool that would be tonic.
>
> Sean
>
>
> ________________________________________
> From: Harish Kulkarni <harish.m.kulkarni@gmail.com>
> Sent: Saturday, June 15, 2019 5:16 PM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in dictionary built with dictionaryBuilder and
> sno_rx16ab from sourceforge [EXTERNAL]
>
> unsubscribe
>
> On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <remys@foreseemed.com>
> wrote:
>
> > Yes, I agree it would be nice because the tokenization that occurs when
> > creating the dictionaries from the releases make comparisons a bit tricky
> > and is not 100% reversible. I would love to hear an answer to your
> > quandary.
> >
> >      Remy
> >
> > On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <jeffmax@gmail.com>
> wrote:
> >
> > > Thanks, I was curious if the cTAKES devs that created the sno_rx_16ab
> > > dictionary had put the differences applied to the default UMLS output
> > into
> > > version control in some form. I imagine the
> > > additions/synonyms/abbreviations that were added manually must have
> been
> > > collected over time somewhere prior to merging them with 2016ab UMLS
> > > release? I basically want to recreate the default cTAKES 4.0.0 release
> > with
> > > an additional ontology and the latest terms. I can likely come up with
> a
> > > diff myself but was wondering if this was already maintained as part of
> > > cTAKES.
> > >
> > > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <remys@foreseemed.com
> >
> > > wrote:
> > >
> > > > Yes, that's pretty much what we do too. Not only to enhance the
> > > dictionary,
> > > > but to put in corrections because, lo and behold, there are some
> errors
> > > in
> > > > there!. As you know, an ontology is a constant curation job and that
> > > > script, under SCM, allows you to isolate those changes and, if
> > necessary,
> > > > re-apply them to new versions.
> > > >
> > > >       Remy
> > > >
> > > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <
> gandhirajan.n@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Jeff,
> > > > >
> > > > > As far as I know, maintaining a separate SQL script to add
> additional
> > > > > entries should work seamlessly.
> > > > >
> > > > > On Saturday, June 15, 2019, Jeffrey Miller <jeffmax@gmail.com>
> > wrote:
> > > > >
> > > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > > modifications/synonyms are tracked anywhere (aside from the
> > > dictionary
> > > > > > itself) so they can be carried forward in future dictionary
> > updates?
> > > > > >
> > > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> > > remys@foreseemed.com>
> > > > > > wrote:
> > > > > >
> > > > > > > From my experience, it seems pretty obvious that sno_rx_16ab
> is a
> > > > > curated
> > > > > > > dictionary based on the SNOMED 2016AB release. It does
not
> > contain
> > > > the
> > > > > > full
> > > > > > > set but it has additional edits and synonyms that are pretty
> > useful
> > > > > > > (including 'dm').
> > > > > > >
> > > > > > > We have had to manage those mods as an adjunct.
> > > > > > >
> > > > > > >       Remy
> > > > > > >
> > > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <
> > jeffmax@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > > I have created a custom dictionary from the latest
UMLS
> release
> > > > with
> > > > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems
to be
> > > generating
> > > > > > > .script
> > > > > > > > file with unexpected differences as compared to the
> sno_rx_16ab
> > > > file
> > > > > > > > available as part of the cTAKES release. Specifically,
for
> > > > diabetes,
> > > > > it
> > > > > > > is
> > > > > > > > missing these two rows:
> > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')
> > > > > > > >
> > > > > > > > and only has this one:
> > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > > > mellitus','mellitus')
> > > > > > > >
> > > > > > > > The end result is that "diabetes" is not being picked
up in
> the
> > > > test
> > > > > > > text I
> > > > > > > > am running through- it requires the full 'diabetes
mellitus'.
> > > > > > > >
> > > > > > > > Is there any setting on the UMLS install side or the
ctTAKES
> > > > > dictionary
> > > > > > > > creator that could account for missing alternative
forms like
> > > this?
> > > > > > I've
> > > > > > > > tried downloading the 2016AB release (which I think
is the
> one
> > > used
> > > > > to
> > > > > > > > create the bundled sno_rx_16ab package?) and I am
not getting
> > the
> > > > > > > alternate
> > > > > > > > forms in that dictionary either.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Jeff
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Gandhi
> > > > >
> > > > > "The best way to find urself is to lose urself in the service of
> > others
> > > > > !!!"
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message