mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From drahman <drahman1...@googlemail.com>
Subject text classification using mahout and lucene index
Date Tue, 11 Oct 2011 10:38:02 GMT
Hi everyone,

I want to use mahout for text classification. Right now I'm reading through
some chapters of the book "mahout in action", but some of the code examples
aren't working yet. So I thougt, that I ask my question right away: how can
I use Mahout for text classification?

My problem is about categorizing text. I have a list of documents
(text+abstract) and for each document I have a list of keywords (multi-label
problem):

doc1:
title
abstract
keyword1
keyword2
keyword3
...

I want to train a classifier using this information to build a recommender.
The data is available as XML and lucene-index. I'm hoping, that I can use
the existing lucene-data, if yes, than how?

Also I want to use different algorithms or combinations of algorithms (i.e.
SVM+naiveBayes), so that I can compare the results.

What I need is direction, i.e. which functions in mahout are interesting for
me?

Thanks in advance!

PS: I got a failure notice, when I tried to subscribe to the mailing list...

--
View this message in context: http://lucene.472066.n3.nabble.com/text-classification-using-mahout-and-lucene-index-tp3412202p3412202.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Mime
View raw message