mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajesh Nikam <rajeshni...@gmail.com>
Subject Re: ** Problem using SGD and iris arff as test set **
Date Fri, 12 Oct 2012 17:01:37 GMT
Hi Ted,

Seems something wrong from input file or parameters to package.
Could you point what is missing ?

Thanks
Rajesh

On Thu, Oct 11, 2012 at 10:18 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Not sure just off=hand.  Need to look in more detail in a debugger.  Need
> to find time to do that.
>
> On Thu, Oct 11, 2012 at 1:58 AM, Rajesh Nikam <rajeshnikam@gmail.com>
> wrote:
>
> > what could be the problem with data formatting ?
> > Could you please update on the same.
> >
> > On Thu, Oct 11, 2012 at 11:31 AM, Ted Dunning <ted.dunning@gmail.com>
> > wrote:
> >
> > > My first thought was that we needed several passes, but that is clearly
> > > wrong.
> > >
> > > I think that the problem is in the data formatting and conversion
> > somehow.
> > >  Haven't had time to dope this out just yet.  The iris data should
> > converge
> > > trivially.
> > >
> > > On Wed, Oct 10, 2012 at 9:58 PM, Rajesh Nikam <rajeshnikam@gmail.com>
> > > wrote:
> > >
> > > > Thanks for looking into it.
> > > >
> > > > Actually first I have tried it with big data. Below was model info
> for
> > > it.
> > > >
> > > > AUC = 0.50
> > > > confusion: [[1252978.0, 23003.0], [0.0, 0.0]]
> > > > entropy: [[-0.0, -0.0], [-46.1, -0.8]]
> > > >
> > > > Looking forward for your comments.
> > > >
> > > > Thanks
> > > > Rajesh
> > > >
> > > >
> > > > On Wed, Oct 10, 2012 at 8:08 PM, Ted Dunning <ted.dunning@gmail.com>
> > > > wrote:
> > > >
> > > > > Sgd is more suitable for large data.  I will take a look later
> today.
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > > On Oct 9, 2012, at 11:29 PM, Rajesh Nikam <rajeshnikam@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Ted,
> > > > > >
> > > > > > Putting specific question with data for getting problem with
SGD.
> > > > > >
> > > > > > I am using Iris Plants Database from Michael Marshall. PFA
> > iris.arff.
> > > > > >
> > > > > > Converted this to csv file just by updating header:
> > > iris-3-classes.csv
> > > > > >
> > > > > > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> > > > > /usr/local/mahout/trunk/iris-3-classes.csv --features 4 --output
> > > > > /usr/local/mahout/trunk/iris-3-classes.model --target class
> > > --categories
> > > > 3
> > > > > --predictors sepallength sepalwidth petallength petalwidth --types
> n
> > n
> > > > > >
> > > > > > >> it gave following error.
> > > > > > Exception in thread "main" java.lang.IllegalArgumentException:
> Can
> > > only
> > > > > call classifyScalar with two categories
> > > > > >
> > > > > > Now created csv with only 2 classes. PFA iris-2-classes.csv
> > > > > >
> > > > > > >> trained iris-2-classes.csv with sgd
> > > > > >
> > > > > > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> > > > > /usr/local/mahout/trunk/iris-2-classes.csv --features 4 --output
> > > > > /usr/local/mahout/trunk/iris-2-classes.model --target class
> > > --categories
> > > > 2
> > > > > --predictors sepallength sepalwidth petallength petalwidth --types
> n
> > n
> > > > > >
> > > > > >
> > > > > > mahout runlogistic --input
> > /usr/local/mahout/trunk/iris-2-classes.csv
> > > > > --model /usr/local/mahout/trunk/iris-2-classes.model --auc
> > --confusion
> > > > > >
> > > > > > AUC = 0.14
> > > > > > confusion: [[50.0, 50.0], [0.0, 0.0]]
> > > > > > entropy: [[-0.6, -0.3], [-0.8, -0.4]]
> > > > > >
> > > > > > >> AUC seems to poor. Now changed --predictors
> > > > > >
> > > > > > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> > > > > /usr/local/mahout/trunk/iris-2-classes.csv --features 4 --output
> > > > > /usr/local/mahout/trunk/iris-2-classes.model --target class
> > > --categories
> > > > 2
> > > > > --predictors sepalwidth petallength --types n n
> > > > > >
> > > > > > mahout runlogistic --input
> > /usr/local/mahout/trunk/iris-2-classes.csv
> > > > > --model /usr/local/mahout/trunk/iris-2-classes.model --auc
> > --confusion
> > > > > --scores
> > > > > >
> > > > > > AUC = 0.80
> > > > > > confusion: [[50.0, 50.0], [0.0, 0.0]]
> > > > > > entropy: [[-0.7, -0.3], [-0.7, -0.4]]
> > > > > >
> > > > > > AUC is improved, however from confusion matrix seems everything
> is
> > > > > classified as class a.
> > > > > >
> > > > > > Below is the output.
> > > > > >
> > > > > > "target","model-output","log-likelihood"
> > > > > > 0,0.492,-0.677017
> > > > > > 0,0.493,-0.679192
> > > > > > 0,0.493,-0.678355
> > > > > > 0,0.493,-0.678724
> > > > > > 0,0.492,-0.676583
> > > > > > 0,0.491,-0.675182
> > > > > > 0,0.492,-0.677452
> > > > > > 0,0.492,-0.677419
> > > > > > 0,0.493,-0.679628
> > > > > > 0,0.493,-0.678724
> > > > > > 0,0.491,-0.676116
> > > > > > 0,0.492,-0.677386
> > > > > > 0,0.493,-0.679192
> > > > > > 0,0.493,-0.679291
> > > > > > 0,0.491,-0.674912
> > > > > > 0,0.490,-0.673081
> > > > > > 0,0.491,-0.675313
> > > > > > 0,0.492,-0.677017
> > > > > > 0,0.491,-0.675616
> > > > > > 0,0.491,-0.675682
> > > > > > 0,0.492,-0.677353
> > > > > > 0,0.491,-0.676116
> > > > > > 0,0.492,-0.676714
> > > > > > 0,0.492,-0.677788
> > > > > > 0,0.492,-0.677287
> > > > > > 0,0.493,-0.679126
> > > > > > 0,0.492,-0.677386
> > > > > > 0,0.492,-0.676984
> > > > > > 0,0.492,-0.677452
> > > > > > 0,0.492,-0.678256
> > > > > > 0,0.493,-0.678691
> > > > > > 0,0.492,-0.677419
> > > > > > 0,0.491,-0.674381
> > > > > > 0,0.490,-0.673980
> > > > > > 0,0.493,-0.678724
> > > > > > 0,0.493,-0.678387
> > > > > > 0,0.492,-0.677050
> > > > > > 0,0.493,-0.678724
> > > > > > 0,0.493,-0.679225
> > > > > > 0,0.492,-0.677419
> > > > > > 0,0.492,-0.677050
> > > > > > 0,0.495,-0.682279
> > > > > > 0,0.493,-0.678355
> > > > > > 0,0.492,-0.676951
> > > > > > 0,0.491,-0.675550
> > > > > > 0,0.493,-0.679192
> > > > > > 0,0.491,-0.675649
> > > > > > 0,0.493,-0.678322
> > > > > > 0,0.491,-0.676116
> > > > > > 0,0.492,-0.677887
> > > > > > 1,0.492,-0.709316
> > > > > > 1,0.492,-0.709248
> > > > > > 1,0.492,-0.708935
> > > > > > 1,0.494,-0.705048
> > > > > > 1,0.493,-0.707488
> > > > > > 1,0.493,-0.707454
> > > > > > 1,0.492,-0.709765
> > > > > > 1,0.494,-0.705258
> > > > > > 1,0.493,-0.707936
> > > > > > 1,0.493,-0.706803
> > > > > > 1,0.495,-0.703539
> > > > > > 1,0.493,-0.708249
> > > > > > 1,0.494,-0.704601
> > > > > > 1,0.493,-0.707970
> > > > > > 1,0.493,-0.707597
> > > > > > 1,0.492,-0.708765
> > > > > > 1,0.492,-0.708351
> > > > > > 1,0.493,-0.706871
> > > > > > 1,0.494,-0.704770
> > > > > > 1,0.494,-0.705908
> > > > > > 1,0.492,-0.709350
> > > > > > 1,0.493,-0.707285
> > > > > > 1,0.493,-0.706247
> > > > > > 1,0.493,-0.707522
> > > > > > 1,0.493,-0.707835
> > > > > > 1,0.492,-0.708317
> > > > > > 1,0.493,-0.707556
> > > > > > 1,0.492,-0.708520
> > > > > > 1,0.493,-0.707902
> > > > > > 1,0.494,-0.706220
> > > > > > 1,0.494,-0.705427
> > > > > > 1,0.494,-0.705393
> > > > > > 1,0.493,-0.706803
> > > > > > 1,0.493,-0.707210
> > > > > > 1,0.492,-0.708351
> > > > > > 1,0.492,-0.710146
> > > > > > 1,0.492,-0.708867
> > > > > > 1,0.494,-0.705183
> > > > > > 1,0.493,-0.708215
> > > > > > 1,0.494,-0.705942
> > > > > > 1,0.493,-0.706525
> > > > > > 1,0.492,-0.708385
> > > > > > 1,0.493,-0.706389
> > > > > > 1,0.494,-0.704811
> > > > > > 1,0.493,-0.706905
> > > > > > 1,0.493,-0.708249
> > > > > > 1,0.493,-0.707801
> > > > > > 1,0.493,-0.707835
> > > > > > 1,0.494,-0.705604
> > > > > > 1,0.493,-0.707319
> > > > > >
> > > > > > AUC = 0.80
> > > > > > confusion: [[50.0, 50.0], [0.0, 0.0]]
> > > > > > entropy: [[-0.7, -0.3], [-0.7, -0.4]]
> > > > > >
> > > > > > SGD is suitable for what kind of data?
> > > > > >
> > > > > > Thanks,
> > > > > > Rajesh
> > > > > >
> > > > > >
> > > > > > <iris-2-classes.csv>
> > > > > > <iris-3-classes.csv>
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message