mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Querry regarding use of classifier in Mahout
Date Wed, 20 Oct 2010 13:35:39 GMT
@robin and @ted

I tested it in a different way.
I created a program to convert input text to Mahout training format. The
program will remove all the punctuation and junk charters from a text,
removes any numbers like year date exists there. Then it converts the text
to lowercase. After that the text will be prepared in to a mahout training
format (label"\t" text"\n").

After training with CBayesClasssifier I tested it.
The result is
1) with ng=1 -a=1.0
Correctly calssified instances = 52.5%
Incorrect = 47.5%
2) with ng=2 -a=1.0
Correctly calssified instances = 74.5%
Incorrect = 25.5%

Now I have question .
1) The output of preparetwentynesgroup creates a text from where all the
stop words are removed. Also the text will be just a simple collection of
words . So when we apply generateNGramsWithoutLabel() will it it generate
NGrams correctly (Means accuracy of ngram?)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message