mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Bauer <>
Subject OnlineLogisticRegression: Are my settings sensible
Date Thu, 07 Nov 2013 21:48:06 GMT

I’m trying to use OnlineLogisticRegression for a two-class classification problem, but as
my classification results are not very good, I wanted to ask for support to find out if my
settings are correct and if I’m using Mahout correctly. Because if I’m doing it correctly
then probably my features are crap...

In total I have 12 features. All are continuous values and all are normalized/standardized
(has not effect on the classification performance at the moment).  

Training samples keep flowing in at constant rate (i.e. incremental training), but in total
it won’t be more than a few thousand (class split pos/negative 30:70).  

My performance measure do not really get good, e.g. with approx. 3600 training samples I get

f-measure(beta=0.5): 0.38
precision: 0.33
recall: 0.47

The parameters I use are



Java code snip:

private OnlineLogisticRegression olr;
private ContinuousValueEncoder continousValueEncoder;

private static final FeatureVectorEncoder BIAS = new ConstantValueEncoder("Intercept“);

public Training() {
       olr = new OnlineLogisticRegression(CATEGORIES_NUMBER, FEATURE_NUMBER,new L1()); //L2
or ElasticBandPrior do not affect the performance
       this.continousValueEncoder = new ContinuousValueEncoder("ContinuousValueEncoder");

public void train(TrainingSample sample, int target){
DenseVector denseVector = new DenseVector(FEATURE_NUMBER);
//sample.getFeatureValue1-15() return a double value
        this.continousValueEncoder.addToVector((byte[]) null, sample.getFeatureValue1(), denseVector);
this.continousValueEncoder.addToVector((byte[]) null, sample.getFeatureValue15(), denseVector);
BIAS.addToVector((byte[]) null, 1, denseVector);
        olr.train(target, denseVector);

It is also interesting to notice, that when I use the model both test and classification yield
always probabilities of 1.0 or 0.99xxx for either class.  

result = this.olr.classifyFull(input);
LOGGER.debug("TrainingSink test: classify real category:"
+ realCategory + " olr classifier result: "
+ result.maxValueIndex() + " prob: " + result.maxValue());

Would be great if you could give me some advise.  

Many thanks,


View raw message