spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: OOM when making bins in BinaryClassificationMetrics ?
Date Sun, 02 Nov 2014 18:45:47 GMT
Agree, just rounding only makes sense if the values are sort of evenly
distributed -- in my case they were in 0,1. I will put it on my to-do
list to look at, yes. Thanks for the confirmation.

On Sun, Nov 2, 2014 at 7:44 PM, Xiangrui Meng <> wrote:
> Yes, if there are many distinct values, we need binning to compute the
> AUC curve. Usually, the scores are not evenly distribution, we cannot
> simply truncate the digits. Estimating the quantiles for binning is
> necessary, similar to RangePartitioner:
> . Limiting the number of bins is definitely useful. Do you have time
> to work on it? -Xiangrui
> On Sun, Nov 2, 2014 at 9:34 AM, Sean Owen <> wrote:
>> This might be a question for Xiangrui. Recently I was using
>> BinaryClassificationMetrics to build an AUC curve for a classifier
>> over a reasonably large number of points (~12M). The scores were all
>> probabilities, so tended to be almost entirely unique.
>> The computation does some operations by key, and this ran out of
>> memory. It's something you can solve with more than the default amount
>> of memory, but in this case, it seemed unuseful to create an AUC curve
>> with such fine-grained resolution.
>> I ended up just binning the scores so there were ~1000 unique values
>> and then it was fine.
>> Does that sound generally useful as some kind of parameter? or am I
>> missing a trick here.
>> Sean
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message