ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From andy mcmurry <mcmurry.a...@gmail.com>
Subject Recommendation for ctakes default (UMLS) dictionaries
Date Tue, 09 Sep 2014 21:31:56 GMT
Greetings ctakes-dev:

*UMLS license restrictions have been getting more lax over the years --  *much
of the UMLS can be downloaded directly from the NCBI official FTP site.

In fact, the NIH (and implicitly the NLM) *have already made the standard
terms public for some medical specialities*.

For example: Here is the UMLS subset specific to Medical Genetics (MedGen)
and Genetic Testing (GTR) complete with SNOMED-CT concept CUI(s) and names,
etc :

[  ftp://ftp.ncbi.nlm.nih.gov/pub/medgen/README.html  ]

My team has developed a JVM based wrapper for MetaMap 2013AB which I intend
to open source soon (Clojure).  It includes REST support for invoking
MetaMap with any or all of the command line arguments.
We do not integrate with UIMA, we are basically a wrapper around the binary
installation of MetaMap. The emphasis is on publication text not clinical
text, still, some services are common (such as LVG).

Strangely, the NLM still requires UMLS licenses to download MetaMap
execution binaries. The MetaMap binary install is better but customizing
dictionaries (DataFileBuilder) is not as easy to use as CTAKES with YTEXT

[ https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation ]

*** Hence, there is a real opportunity here to enable Apache cTAKES to have
a stronger default dictionary. ** *

Imagine if we could
*$ apt-get install apache-ctakes *

and instantly have a working package for SOME problem domain.
In my case (Medical Genetics) the UMLS definitions are already available
and the UMLS license problem becomes a non issue, at least for many first
time users

Your thoughts?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message