mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Getting Started with Classification
Date Fri, 24 Jul 2009 20:23:45 GMT
I did, and it performs worse, but maybe I did something wrong.

On Jul 22, 2009, at 9:50 PM, Robin Anil wrote:

> Did you try CBayes. Its supposed to negate the class imbalance effect
> to some extend
>
>
>
> On Thu, Jul 23, 2009 at 5:02 AM, Ted Dunning<ted.dunning@gmail.com>  
> wrote:
>> Some learning algorithms deal with this better than others.  The  
>> problem is
>> particularly bad in information retrieval (negative examples  
>> include almost
>> the entire corpus, positives are a tiny fraction) and fraud (less  
>> than 1% of
>> the training data is typically fraud).
>>
>> Down-sampling the over-represented case is the simplest answer  
>> where you
>> have lots of data.  It doesn't help much to have more than 3x more  
>> data for
>> one case as another anyway (at least in binary decisions).
>>
>> Another aspect of this is the cost of different errors.  For  
>> instance, in
>> fraud, verifying a transaction with a customer has low cost (but not
>> non-zero) while not detecting a fraud in progress can be very, very  
>> bad.
>> False negatives are thus more of a problem than false positives and  
>> the
>> models are tuned accordingly.
>>
>> On Wed, Jul 22, 2009 at 4:03 PM, Miles Osborne <miles@inf.ed.ac.uk>  
>> wrote:
>>
>>> this is the class imbalance problem  (ie you have many more  
>>> instances for
>>> one class than another one).
>>>
>>> in this case, you could ensure that the training set was balanced  
>>> (50:50);
>>> more interestingly, you can have a prior which corrects for this.   
>>> or, you
>>> could over-sample or even under-sample the training set, etc etc.
>>>
>>
>>
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message