ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Karl Thompson <...@northwestern.edu>
Subject RE: resources for training modules
Date Thu, 05 Sep 2013 18:09:48 GMT
Steve and Dima,

Thanks very much for these replies, this is very helpful to get started. I'll give this a
try over the next couple of weeks, will let you know how it works.



-----Original Message-----
From: Steven Bethard [mailto:stevenbethard@apache.org] 
Sent: Friday, August 30, 2013 9:37 AM
To: dev@ctakes.apache.org
Subject: Re: resources for training modules

On Fri, Aug 30, 2013 at 8:46 AM, Dmitriy Dligach <dmitriy.dligach@childrens.harvard.edu>
> Retraining the relation extractor should be fairly easy. The 
> instructions I am about to give you apply if you are using cTAKES 3.0. 
> However, if you are planning to use the trunk version, my instructions 
> may no longer be accurate. Relation extraction has undergone some 
> changes recently in connection with cTAKES-190 issue and I don't fully 
> understand these most recent changes yet (but I am working on it).

With the trunk version, there's no need to run PreprocessAndWriteXmi.
Just run RelationExtractorEvaluation or RelationExtractorTrain directly. (The XMIs will be
automatically written to target/xmi.) I believe the only required argument is --batches-dir,
which gives the directory containing the directories containing Knowtator_XML directories.
The other (optional) arguments should be similar to what Dima described (and you can see what
they are by looking at the static Options classes (and their superclasses) in RelationExtractorEvaluation
and RelationExtractorTrain).

> If you are planning to annotate your data, it might be easier to use 
> Knowtator since we already have a gold standard reader for Knowtator. 
> If you want to use a different annotation tool, you just have to make 
> sure you add the manual annotations to the gold view of the XMI files.

In the trunk version, most of the SHARP-specific stuff is handled by the SHARPXMI class. So
if you need to customize things away from what was done for SHARP, that's probably where you'll
need to go. At the moment, RelationExtractorTrain and RelationExtractorEvaluation call static
methods on SHARPXMI, which means that it's not very extensible.
We could conceivably change these methods to non-static methods, and then extensions of relation-extractor
could provide their own implementation. We're certainly open to modifying the infrastructure
here, so if you have any suggestions please do pass them on.

View raw message