ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Masanz, James J." <Masanz.Ja...@mayo.edu>
Subject RE: Recommendation for ctakes default (UMLS) dictionaries
Date Wed, 10 Sep 2014 19:13:25 GMT

I would love to see the install be as simple as apt-get install to end up with some working
dictionary that have more than a handful of entries to get them started.

James Masanz

-----Original Message-----
From: andy mcmurry [mailto:mcmurry.andy@gmail.com] 
Sent: Tuesday, September 09, 2014 4:32 PM
To: ctakes-dev@incubator.apache.org
Subject: Recommendation for ctakes default (UMLS) dictionaries

Greetings ctakes-dev:

*UMLS license restrictions have been getting more lax over the years --  *much of the UMLS
can be downloaded directly from the NCBI official FTP site.

In fact, the NIH (and implicitly the NLM) *have already made the standard terms public for
some medical specialities*.

For example: Here is the UMLS subset specific to Medical Genetics (MedGen) and Genetic Testing
(GTR) complete with SNOMED-CT concept CUI(s) and names, etc :

[  ftp://ftp.ncbi.nlm.nih.gov/pub/medgen/README.html  ]

My team has developed a JVM based wrapper for MetaMap 2013AB which I intend to open source
soon (Clojure).  It includes REST support for invoking MetaMap with any or all of the command
line arguments.
We do not integrate with UIMA, we are basically a wrapper around the binary installation of
MetaMap. The emphasis is on publication text not clinical text, still, some services are common
(such as LVG).

Strangely, the NLM still requires UMLS licenses to download MetaMap execution binaries. The
MetaMap binary install is better but customizing dictionaries (DataFileBuilder) is not as
easy to use as CTAKES with YTEXT

[ https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation ]

*** Hence, there is a real opportunity here to enable Apache cTAKES to have a stronger default
dictionary. ** *

Imagine if we could
*$ apt-get install apache-ctakes *

and instantly have a working package for SOME problem domain.
In my case (Medical Genetics) the UMLS definitions are already available and the UMLS license
problem becomes a non issue, at least for many first time users

Your thoughts?
View raw message