ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject RE: Information Regarding Apache cTAKES-3.0
Date Tue, 03 Sep 2013 19:38:21 GMT
Hi Arohi,
I'm glad that you have it working.
To get started, I think a good place to get started would be to take a look at the current
type system[1] which outlines the output[2] that cTAKES currently supports.
As you already found, the IdentifiedAnnotation (and it's subsclasses such as xMention, and
the UmlsConcept codes)
Unfortunately, there isn't much more documentation than what's in the current guides[4] at
this point in time.  However, the mailing lists are a great place to look for answers you
may have.  To learn more about the flow of control of the code, you may want to check out
the UIMA [4] framework which cTAKES is built on top of.
[1] http://ctakes.apache.org/user-faqs.html#what-are-the-available-attributes-types-in-ctakes
[2] http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-type-system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSystem.xml
[3] https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1+Component+Use+Guide
[4] http://uima.apache.org

I hope that helps.

From: Arohi Kumar [mailto:arohi@mobipulse.in] 
Sent: Tuesday, September 03, 2013 2:53 PM
To: Chen, Pei
Subject: Re: Information Regarding Apache cTAKES-3.0

Hi Pei,
Thanks for your suggestion. That worked like a charm. I also made it work using Lucene 3.6
to write a new index which was subsequently readable by the Lucene 4.0 jars present in the
project. Just for curiosity, I have found(by experimenting with Lucene versions) that the
original OrangeBook index was written to by a Lucene version preceding 1.9. Hope that I am
Now that I am obtaining the output :
I want to be able to understand what I am getting. I have looked at the output and things
like the LookupWindowAnnotation, SignSymptomMention, Concept, UmlsConcept jump out as being
really useful. I want to understand the other outputs as well as how the code gave them to
me. I have looked at the Component Use Guide, which gives me a overall idea of the cTAKES
pipeline. I am looking for a more detailed explanation. 

I understand that ultimately I will have to get my hands dirty and delve into the code. Are
there any other resources for helping me get started like an explanation of the output and
the flow of control of the code.
Thank you
Arohi Kumar
Ex-CSE, IIT Kharagpur

On Tue, Sep 3, 2013 at 7:07 PM, Chen, Pei <Pei.Chen@childrens.harvard.edu> wrote:
Hi Arohi,
OrangeBook is included in cTAKES' ctakes-dictionary-lookup-res project now:
Feel free to let us know if that works for you.
From: Arohi Kumar [mailto:arohi@mobipulse.in] 
Sent: Tuesday, September 03, 2013 6:29 AM
To: Chen, Pei
Subject: Re: Information Regarding Apache cTAKES-3.0
I'm sorry, the link is 
On Tue, Sep 3, 2013 at 3:58 PM, Arohi Kumar <arohi@mobipulse.in> wrote:
Hi Pei,
I am a newbie and learning Apache cTAKES-3.0 for a project. 
I was facing an error which was caused when lucene-4.0(included in Apache cTAKES) tries to
read the OrangeBook index. 
I went through the mail archives and found that clearing up and replacing the OrangeBook index

will solve the problem. The above link seems to be broken. I will be grateful if you could
send me an updated link if one exists.
Some alternative ways of solving the problem:
1. Since the orangebook index has size of only 19,000(approx), I think that we can also write
a new index using lucene-3.0(because 4.0 is able to read indexes written by 3.0 and later).
2. Change the lucene-4.0 jars in maven dependency to lucene-3.0 jars, but that would lead
to dependencies being broken and so, I don't want to get into that.
You suggestions are most welcome.
Arohi Kumar
Ex- CSE, IIT Kharagpur 

View raw message