mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: What are the ways to train and run classifiers on text?
Date Sun, 26 Sep 2010 16:40:17 GMT
Take a look also at TrainNewsGroups in the classifier.sgd package in
examples.

That shows how to parse documents for use with an SGD classifier (different
from NaiveBayes).

There is much more format flexibility with an API oriented approach.

On Sun, Sep 26, 2010 at 9:37 AM, Bhaskar Ghosh <bjgindia@yahoo.co.in> wrote:

> Dear All,
>
> I need to classify a bunch of text files, so determine which class does
> each one
> of these texts fall.
>
>
> Now I have seen through the 20Newsgroups example. I see that the input text
> files need to have a particular format:
>
> <class-label> <tab> <unique features (words) associated with the
> class-label>
>
>
> But the real question is how do I get such a pre-processed input file? Do I
> need
> to process the input text files, to get it into the required format? Then
> it
> would required extracting the unique words/features from the raw text, in
> addition to assigning class-labels, as well.
>
> OR
>
> There is some classifier class that can take raw input files? My input
> would be
> something like:
>
> <class-label1> <file1-text>
> <class-label2> <file3-text>
> <class-label1> <file2-text>
> etc.
>
>
> Thanks
> Bhaskar Ghosh
> Hyderabad, India
>
> http://www.google.com/profiles/bjgindia
>
> "Ignorance is Bliss... Knowledge never brings Peace!!!"
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message