mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paritosh Ranjan <>
Subject Re: Using mahout for classifying tweets
Date Sat, 01 Sep 2012 11:03:02 GMT
Even I am a novice at Mahout Classification, still I will try to give it 
a shot in hope that someone will correct me or improve the answer.

First thing, the text data ( tweets ) would need conversion into 
Vectors. In Mahout terms, this is known as vector encoding. This can be 
done into three ways (one Vector cell
per word, category, or continuous value, Represent Vectors implicitly as 
bags of words, or feature hashing).

Look for ContinuousValueEncoder, AdaptiveWordValueEncoder, 
StaticWordValueEncoder and FeatureVectorEncoder classes or seqdirectory, 
seq2encoded commands.

Then you can use OnlineLogisticRegression, CrossFoldLearner and 
AdaptiveLogisticRegression classes or trainnb, testnb, trainlogistic, 
runlogistic, trainAdaptiveLogistic, validateAdaptiveLogistic, 
runAdaptiveLogistic commands for configuring classification algorithms.


On 01-09-2012 15:24, Siddharth Tiwari wrote:
> Hi Users,
> I am novice at using Mahout. Can anybody guide me at how can I use Mahout for classifying
text into differen classes. In my case its 5 classes and the text is tweets. I mean if there
is any tutorial on how to create training model for mahout and how to use it for training
and then how we give the dataset for classification ( how we make it compatible for mahout
), then after the classification how to infer the output etc.
> I am sorry if my questions seem dumb, but its only because I have very little knowledge
about mahout and I am trying to get grip on it. Thank you so much
> *------------------------*
> Cheers !!!
> Siddharth Tiwari
> Have a refreshing day !!!
> "Every duty is holy, and devotion to duty is the highest form of worship of God.”
> "Maybe other people will try to limit me but I don't limit myself"

View raw message