ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Bethard <steven.beth...@Colorado.EDU>
Subject Re: training-data.libsvm Vs model.libsvm (RelationExtractor)
Date Tue, 21 May 2013 13:44:12 GMT
On May 20, 2013, at 9:34 PM, giri vara prasad nambari <girinambari@gmail.com> wrote:
> Here is the link where I found these files:
> https://svn.apache.org/repos/asf/ctakes/tags/ctakes-3.0.0-incubating/ctakes-relation-extractor/resources/models/modifier_extractor/

I see. You're not working from the current repository, you're working from 3.0.0-incubating.
Those files were erroneously included in that release.

> "The training-data.libsvm file will not be generated during classification.
> It's only generated during training", I understood this part.
> "LIBSVM-formatted features and labels" this part is I am not clear, how
> these are generated from training data?

They're generated from the features created by the RelationFeaturesExtractors I pointed you
to in RelationExtractorAnnotator. The conversion from Feature objects to LIBSVM-formatted
feature strings is performed by ClearTK, specifically by the LIBSVMStringOutcomeDataWriter.
(You can see that the models are trained with that data writer class in RelationExtractorTrain.)

> Is it based on SVM algorithm?

The LIBSVM-format for providing features and labels is defined by LIBSVM (http://www.csie.ntu.edu.tw/~cjlin/libsvm/).
It's basically <label> <feature>:<value> <feature>:<value> ...


> On Mon, May 20, 2013 at 5:54 PM, Steven Bethard <steven.bethard@colorado.edu
>> wrote:
>> On May 17, 2013, at 2:29 PM, giri vara prasad nambari <
>> girinambari@gmail.com> wrote:
>>> Can someone please clarify the difference between training-data.libsvm
>> and
>>> model.libsvm in ctakes-relation-extractor module?
>> Where are you seeing these? Neither should be in the repository.
>> That said, training-data.libsvm is the LIBSVM-formatted features and
>> labels, and model.libsvm is the LIBSVM model file.
>>> If so,
>>> could someone provide any references/sample on how this file will be
>>> generated for a sample annotated sentence?
>> The training-data.libsvm file will not be generated during classification.
>> It's only generated during training. If you want to see what features are
>> generated during classification, take a look at RelationExtractorAnnotator,
>> which defines a List<RelationFeaturesExtractor> getFeatureExtractors(),
>> which defines the various feature extractors used by the relation
>> extractors.
>> Not sure if I answered your question. Please feel free to follow up.
>> Steve

View raw message