mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhaskar Ghosh <>
Subject What are the ways to train and run classifiers on text?
Date Sun, 26 Sep 2010 16:37:48 GMT
Dear All,

I need to classify a bunch of text files, so determine which class does each one 
of these texts fall. 

Now I have seen through the 20Newsgroups example. I see that the input text 
files need to have a particular format:

<class-label> <tab> <unique features (words) associated with the class-label>

But the real question is how do I get such a pre-processed input file? Do I need 
to process the input text files, to get it into the required format? Then it 
would required extracting the unique words/features from the raw text, in 
addition to assigning class-labels, as well.


There is some classifier class that can take raw input files? My input would be 
something like:

<class-label1> <file1-text>
<class-label2> <file3-text>
<class-label1> <file2-text>

Bhaskar Ghosh
Hyderabad, India

"Ignorance is Bliss... Knowledge never brings Peace!!!"

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message