mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan-Hendrik Lendholt <jan.lendh...@gmail.com>
Subject Using multiple predictors for Mahout Classification
Date Thu, 05 Sep 2013 09:53:41 GMT
Hi there,

I am relatively new Hadoop and a greenhorn concerning Mahout :)

Basically, I am playing around with mahout classification. We want to classify user ratings
for our product, not only based on the stars but also on the text.
Okay, I guess it is a sentiment :)
The whole process is quite clear, so I do not have any problems with the algos or so.

So, we got the target variable "x" that has already been classified by a very patient person
to -1/0/+1.

And we have several predictors, e.g. category,country, language, ratingText, ratingTitle and
a little bit more.
Country+language+category are categorical, ratingText and ratingTitle is a text-like predictor.


I am doing an SQL select and receive all these values and now I want to write them line-by-line
into a hadoop sequence file so that mahout can read these data.

How do I arrange multiple values under the same key, e.g. /category/DATABASEID?

I tried to adapt the SequenceFile Writer from here https://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/
but i don't know how to store the single predictors.

I appreciate any hint on how to solve this problem!

Many thanks,

Jan


Mime
View raw message