ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Remy Sanouillet <re...@foreseemed.com>
Subject Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]
Date Sun, 16 Jun 2019 19:26:57 GMT
Thanks for the clarifications, Sean. That was very enlightening. I look
forward to the documentation (even if it entails some suffering on your
part.)

If/when you stumble on some idle time allowing you to implement the manual
edit panel, it would be nice to have it allow for re-partitioning the
ontology. As you are very aware, UMLS CUIs and SNOMED do not always have a
one-to-one correspondence resulting in a CUI matching multiples SNOMEDs or
a SNOMED being mapped to several CUIs.

In some cases, clinicians don't agree with that partitioning in specialized
contexts and the inheritance that ensues and would like to re-assign them.

Not holding my breath, but just something to keep in mind.

      Remy

On Sun, Jun 16, 2019 at 7:16 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> >1) ...
> There are several collections of filter sets here:
> ctakes-gui-res\src\main\resources\org\apache\ctakes\gui\dictionary\data\
>
> 2) ...
> There is additional logic within the dictionary creator code:
> ctakes-gui\src\main\java\org\apache\ctakes\gui\dictionary\
>
> I haven't gone through it in a really long time, and without doing so now
> I can't enumerate the filters.  I have family visiting, otherwise my
> curiosity would force me to do so and get back to you.   Honestly, it
> should be documented somewhere, but writing (especially technical) is
> pretty much my least favorite activity.
>
> Sean
>
>
> p.s.
> Please don't wait for it, but I am currently working on new dictionary
> code and plan to introduce that in ctakes.  Again, please don't wait for it
> as it is mixed in with other work and will not be available for several
> months (if at all).
>
>
> ________________________________________
> From: Jeffrey Miller <jeffmax@gmail.com>
> Sent: Sunday, June 16, 2019 9:49 AM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in dictionary built with dictionaryBuilder and
> sno_rx16ab from sourceforge [EXTERNAL]
>
> Hi Sean,
>
> Thanks for your response. I had two follow-up questions that would be very
> helpful to understand if you have a few moments:
>
> 1) Are the specific filters used in the official sno_rx_16ab codified
> anywhere so that I could reproduce them?
>
> 2) Do these filters explain all the changes? For example, when I use the
> dictionary creator to export sno_med and rx_norm, I only get "diabetes
> mellitus" where as sno_rx_16ab contains both "diabetes" and "dm".
> Especially with the addition of "dm" it feels like I must be missing a step
> or a setting somewhere.
>
> Thanks!
> Jeff
>
> On Sun, Jun 16, 2019 at 8:55 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi all,
> >
> > The contents of the sno_rx_16ab are a dump of the umls 2016AB snomed and
> > rxnorm terms with certain symantic types.  Nothing was added, but
> synonyms
> > are filtered based upon various rules.  For instance, unnecessary
> suffixes
> > are removed ("Wart (Finding)" -> "Wart"), really long terms are excluded
> > ("can walk straight line with only minimal assistance"), terms with dose
> or
> > form are ignored and so forth.
> >
> > Some filters can be changed by adding/removing from
> prefix/suffix/contains
> > lists in plaintext files or by modifying the dictionary creator code.
> >
> > There was no manual curation (or nothing major).  As Remy mentioned that
> > requires a lot of attention and time.  The dictionary database was not
> > intended to be perfect, just as good as possible without major
> investment -
> > and reproducible with updates to the umls.
> >
> > As the dictionary is released as a sql database, you should be able to
> add
> > and remove fairly easily if sql savvy.  I have long wanted to add a
> "manual
> > edit" panel to the dictionary gui, but haven't had the time.  If anybody
> > else would like to work on such a tool that would be tonic.
> >
> > Sean
> >
> >
> > ________________________________________
> > From: Harish Kulkarni <harish.m.kulkarni@gmail.com>
> > Sent: Saturday, June 15, 2019 5:16 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Differences in dictionary built with dictionaryBuilder and
> > sno_rx16ab from sourceforge [EXTERNAL]
> >
> > unsubscribe
> >
> > On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <remys@foreseemed.com>
> > wrote:
> >
> > > Yes, I agree it would be nice because the tokenization that occurs when
> > > creating the dictionaries from the releases make comparisons a bit
> tricky
> > > and is not 100% reversible. I would love to hear an answer to your
> > > quandary.
> > >
> > >      Remy
> > >
> > > On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <jeffmax@gmail.com>
> > wrote:
> > >
> > > > Thanks, I was curious if the cTAKES devs that created the sno_rx_16ab
> > > > dictionary had put the differences applied to the default UMLS output
> > > into
> > > > version control in some form. I imagine the
> > > > additions/synonyms/abbreviations that were added manually must have
> > been
> > > > collected over time somewhere prior to merging them with 2016ab UMLS
> > > > release? I basically want to recreate the default cTAKES 4.0.0
> release
> > > with
> > > > an additional ontology and the latest terms. I can likely come up
> with
> > a
> > > > diff myself but was wondering if this was already maintained as part
> of
> > > > cTAKES.
> > > >
> > > > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <
> remys@foreseemed.com
> > >
> > > > wrote:
> > > >
> > > > > Yes, that's pretty much what we do too. Not only to enhance the
> > > > dictionary,
> > > > > but to put in corrections because, lo and behold, there are some
> > errors
> > > > in
> > > > > there!. As you know, an ontology is a constant curation job and
> that
> > > > > script, under SCM, allows you to isolate those changes and, if
> > > necessary,
> > > > > re-apply them to new versions.
> > > > >
> > > > >       Remy
> > > > >
> > > > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <
> > gandhirajan.n@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Jeff,
> > > > > >
> > > > > > As far as I know, maintaining a separate SQL script to add
> > additional
> > > > > > entries should work seamlessly.
> > > > > >
> > > > > > On Saturday, June 15, 2019, Jeffrey Miller <jeffmax@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > > > modifications/synonyms are tracked anywhere (aside from
the
> > > > dictionary
> > > > > > > itself) so they can be carried forward in future dictionary
> > > updates?
> > > > > > >
> > > > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> > > > remys@foreseemed.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > From my experience, it seems pretty obvious that sno_rx_16ab
> > is a
> > > > > > curated
> > > > > > > > dictionary based on the SNOMED 2016AB release. It
does not
> > > contain
> > > > > the
> > > > > > > full
> > > > > > > > set but it has additional edits and synonyms that
are pretty
> > > useful
> > > > > > > > (including 'dm').
> > > > > > > >
> > > > > > > > We have had to manage those mods as an adjunct.
> > > > > > > >
> > > > > > > >       Remy
> > > > > > > >
> > > > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <
> > > jeffmax@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > > I have created a custom dictionary from the latest
UMLS
> > release
> > > > > with
> > > > > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems
to be
> > > > generating
> > > > > > > > .script
> > > > > > > > > file with unexpected differences as compared
to the
> > sno_rx_16ab
> > > > > file
> > > > > > > > > available as part of the cTAKES release. Specifically,
for
> > > > > diabetes,
> > > > > > it
> > > > > > > > is
> > > > > > > > > missing these two rows:
> > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > > > > INSERT INTO CUI_TERMS
> VALUES(11849,0,1,'diabetes','diabetes')
> > > > > > > > >
> > > > > > > > > and only has this one:
> > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > > > > mellitus','mellitus')
> > > > > > > > >
> > > > > > > > > The end result is that "diabetes" is not being
picked up in
> > the
> > > > > test
> > > > > > > > text I
> > > > > > > > > am running through- it requires the full 'diabetes
> mellitus'.
> > > > > > > > >
> > > > > > > > > Is there any setting on the UMLS install side
or the
> ctTAKES
> > > > > > dictionary
> > > > > > > > > creator that could account for missing alternative
forms
> like
> > > > this?
> > > > > > > I've
> > > > > > > > > tried downloading the 2016AB release (which I
think is the
> > one
> > > > used
> > > > > > to
> > > > > > > > > create the bundled sno_rx_16ab package?) and
I am not
> getting
> > > the
> > > > > > > > alternate
> > > > > > > > > forms in that dictionary either.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Jeff
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > > Gandhi
> > > > > >
> > > > > > "The best way to find urself is to lose urself in the service
of
> > > others
> > > > > > !!!"
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message