mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: SGD: Logistic regression package in Mahout
Date Mon, 15 Oct 2012 22:27:17 GMT
I would love to help and will before long.  Just can't do it in the first
part of this week.

On Mon, Oct 15, 2012 at 6:28 AM, Rajesh Nikam <rajeshnikam@gmail.com> wrote:

> Hello,
>
> I have asked below question on issue with using sgd on mahout forum.
>
> Similar issue with sgd is reported by
>
> http://stackoverflow.com/questions/11221436/using-sgd-classifier-in-mahout
>
> Even below link has similar output:
>
> AUC = 0.57*confusion: [[27.0, 13.0], [0.0, 0.0]]*
> entropy: [[-0.4, -0.3], [-1.2, -0.7]]
>
>
> http://sujitpal.blogspot.in/2012/09/learning-mahout-classification.html
>
> I am still wannder confusion how then this model works and used by many ?
> Not able to get any points on how to use SGD that generates effective
> model.
>
> Could someone point out what is missing in input file or provided
> parameters.
>
> I appreciate your help.
>
> Below is description of steps that I followed.
>
> PF Attached uses input files for experiment.
>
> I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
> Converted this to csv file just by updating header: iris-3-classes.csv
>
> mahout org.apache.mahout.classifier.
> sgd.TrainLogistic --input /usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output
/usr/local/mahout/trunk/
> *iris-3-classes.model* --target class *--categories 3* --predictors
> sepallength sepalwidth petallength petalwidth --types n
>
> >> it gave following error.
> Exception in thread "main" java.lang.IllegalArgumentException: Can only
> call classifyScalar with two categories
>
> Now created csv with only 2 classes. PFA iris-2-classes.csv
>
> >> trained iris-2-classes.csv with sgd
>
> mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
> 2* --predictors sepallength sepalwidth petallength petalwidth --types n
>
> mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
>
> AUC = 0.14
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.6, -0.3], [-0.8, -0.4]]
>
> >> AUC seems to poor. Now changed --predictors
>
> mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
> 2* --predictors sepalwidth petallength --types n
>
> mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
> --scores
>
> AUC = 0.80
> *confusion: [[50.0, 50.0], [0.0, 0.0]]*
> entropy: [[-0.7, -0.3], [-0.7, -0.4]]
>
> This model classifies everything as category 1 which of no use.
>
> Thanks
> Rajesh
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message