tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TIKA-1642) Integrate cTAKES into Tika
Date Sat, 06 Jun 2015 23:37:01 GMT

     [ https://issues.apache.org/jira/browse/TIKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris A. Mattmann resolved TIKA-1642.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.9
         Assignee: Chris A. Mattmann  (was: Giuseppe Totaro)

- fixed!

{noformat}
bash-3.2$ svn commit -m "Fix for TIKA-1645 & TIKA-1642: Extraction of biomedical information
using CTAKESParser contributed by Selina Chu, Giuseppe Totaro and mattmann."
Sending        CHANGES.txt
Sending        tika-bundle/pom.xml
Sending        tika-parsers/pom.xml
Adding         tika-parsers/src/main/java/org/apache/tika/parser/ctakes
Adding         tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESAnnotationProperty.java
Adding         tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESConfig.java
Adding         tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESContentHandler.java
Adding         tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESParser.java
Adding         tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESSerializer.java
Adding         tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESUtils.java
Sending        tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
Transmitting file data ..........
Committed revision 1683968.
{noformat}


> Integrate cTAKES into Tika
> --------------------------
>
>                 Key: TIKA-1642
>                 URL: https://issues.apache.org/jira/browse/TIKA-1642
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Selina Chu
>            Assignee: Chris A. Mattmann
>             Fix For: 1.9
>
>
> [~gostep] has written a preliminary version of [CTAKESContentHandler|https://github.com/giuseppetotaro/CTAKESContentHadler]
to integrate [Apache cTAKES|http://ctakes.apache.org/] into Tika.
> The CTAKESContentHandler allows to perform the following step into Tika:
> * create an AnalysisEngine based on a given XML descriptor;
> * create a CAS (Common Analysis System) appropriate for this AnalysisEngine;
> * populate the CAS with the text extracted by using Tika;
> * perform the AnalysisEngine against the plain text added to CAS;
> * write out the results in the given format (XML, XCAS, XMI, etc.).
> It would be great improvement if we can parse the output of cTAKES and create a list
of metadata which describes the terms found in the annotation index and their corresponding
tokens. For instance, using the AggregatePlaintextFastUMLSProcessor analysis engine, we can
utilize the UMLS database to obtain the annotations related to DiseaseDisorderMention, and
I would like to be able to produce a list of words corresponding to the input text which is
annotated as DiseaseDisorderMention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message