mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Text Classification using Mahout
Date Tue, 28 Sep 2010 11:25:49 GMT
On Tue, Sep 28, 2010 at 4:35 PM, Grant Ingersoll <>wrote:

> On Sep 27, 2010, at 1:53 PM, Neil Ghosh wrote:
> > HI Grant,
> >
> > Thanks so much for can reply to this in the mailing list.I
> have changed my problem a little bit more common one.
> >
> > I have already gone through the tutorial written by you in IBM site.It
> was very good to start with.Thanks anyway.
> > To be specific my problem is to classify a piece text crawled from web
> into two classes
> >
> > 1.It is a +ve feedback
> > 2.It is -ve feed back.
> >
> > I can  use the two news group example and create a model with some text
> (may be a large no of text ) by inputtng the trainer with these two
> labels.Should I leave everything to the trainer completely like this ?
> >
> Yes, that should be fine.  The trainer doesn't care about the name of the
> label, it just cares that the two sets are relatively independent.  Keep in
> mind, you should set aside some of your data for testing as well.
> > Or Do I have flexibility to give some other input specific to my problem
> ? Such as if words like "Problem", "Complaint" etc are more likely to appear
> in a text containing grievance.
> You can provide a Weight, usually TF-IDF, that often does a good job of
> factoring in the importance of words.  If you have certain sentiment words
> that you think influence things one way or the other, you could consider a
> weighting process that adds weight to those words, I suppose, but I would
> want to experiment with that a bit.
> >
> > Please let me know if you have any ideas and need more info from my side.

I tried the classifier with two class documents - "good" and "bad". But the
system identified all Good documents as well as bad documents as "Good


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message