mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhaskar Ghosh <bjgin...@yahoo.co.in>
Subject Re: What are the ways to train and run classifiers on text?
Date Sun, 26 Sep 2010 17:17:44 GMT
Thanks Ted. But, I am unable to find the org.apache.mahout.classifier.sgd 
package. I could only locate the classifier.bayes.* packages

 Thanks
Bhaskar Ghosh
Hyderabad, India

http://www.google.com/profiles/bjgindia

"Ignorance is Bliss... Knowledge never brings Peace!!!"




________________________________
From: Ted Dunning <ted.dunning@gmail.com>
To: user@mahout.apache.org
Sent: Sun, 26 September, 2010 9:40:17 AM
Subject: Re: What are the ways to train and run classifiers on text?

Take a look also at TrainNewsGroups in the classifier.sgd package in
examples.

That shows how to parse documents for use with an SGD classifier (different
from NaiveBayes).

There is much more format flexibility with an API oriented approach.

On Sun, Sep 26, 2010 at 9:37 AM, Bhaskar Ghosh <bjgindia@yahoo.co.in> wrote:

> Dear All,
>
> I need to classify a bunch of text files, so determine which class does
> each one
> of these texts fall.
>
>
> Now I have seen through the 20Newsgroups example. I see that the input text
> files need to have a particular format:
>
> <class-label> <tab> <unique features (words) associated with the
> class-label>
>
>
> But the real question is how do I get such a pre-processed input file? Do I
> need
> to process the input text files, to get it into the required format? Then
> it
> would required extracting the unique words/features from the raw text, in
> addition to assigning class-labels, as well.
>
> OR
>
> There is some classifier class that can take raw input files? My input
> would be
> something like:
>
> <class-label1> <file1-text>
> <class-label2> <file3-text>
> <class-label1> <file2-text>
> etc.
>
>
> Thanks
> Bhaskar Ghosh
> Hyderabad, India
>
> http://www.google.com/profiles/bjgindia
>
> "Ignorance is Bliss... Knowledge never brings Peace!!!"
>
>
>



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message