mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: The default category of a binary classifier
Date Wed, 19 Sep 2012 21:54:25 GMT
If a classifier is presented text with no words in common with the training
data, it will give you back the most common category in the training data.

That said, it is likely to be quite rare when a new document consists
*entirely* of new words.  Any overlap with trained vocabulary is likely to
over-ride the basic frequencies of different categories.

On Wed, Sep 19, 2012 at 1:32 AM, Salman Mahmood <>wrote:

> First, in mahout, is there a special way to create binary classifier? for
> instance if I am creating classifier for 20 news group data, I will just
> pass 20 as number of categories when creating the training object:
> new AdaptiveLogisticRegression(20, FEATURES, new L1())
> Similarly when creating a binary classifier, I will pass 2 as the number
> of categories and thats it?
> Having established that, what is the default category for a binary
> classifier? Lets say I was building a classifier to recognize the industry
> sector for a news item. I have binary models to classify things into
> technology or not technology, banking or not banking, health or not health
> etc. I trained the technology model with technology related news as
> positive and all the other news as negative (banking and health). Now if
> the technology model got a news item to classify, from the media sector
> (which it was not trained on), what is the expected behavior? Is it gonna
> say it's a technology news or its not a technology news? any default
> behavior for unseen/untrained news items?
> Hope I made the question clear.
> Thanks

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message