mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matteo Moci <>
Subject how to describe custom features
Date Fri, 05 Aug 2011 09:14:57 GMT
Hello everyone,
this is my first post to the list: I am new to mahout but have some
background in machine learning.
I am trying to understand if mahout can be useful for my use case,
and I'll try to describe it to get some advices or insights from any of you.

Basically, I'd like to learn a classifier to apply labels to sentences
of documents.

I can already spot in the training documents (and even in the ones to
classify) the sentences
that should be classified:
let's say every sentence that contains the string "red" should be read
as training input and then labeled in testing.

The thing is, the classification strategy the classifier learns
should depend on a set of features that are not just "internal" to the
sentence (like the contained words):
the features should include the sentence position inside the document
(e.g.: start, middle, end),
some words of the enclosing section's title,
and even some words contained inside the sentence.

Is it possible with mahout (and some custom classes) have this flexibility
and describe such types of features?
The specific algorithm is not really important at this point, I am
only concerned about what I described.

Any type of pointers that could help me?


Matteo Moci

View raw message