mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Watson Watson <watso...@gmail.com>
Subject Fwd: rate option of trainLogistic command
Date Fri, 21 Sep 2012 14:33:25 GMT
Hi,
My question is why changing the rate parameter we always change the
coefficients (results of RunLogistic)?

I encounter the enigmatic impact of changing rates on my own data example,
but since the concern can be reproduced with simple exampe from MIA book,
I'll use it to formulate my doubts:
(example of running RunLogistic exactly as in book and with other rate
parameter values, 50, 40, 60, 500 and 50000 respectively)
[banki@cos 1]$ mahout org.apache.mahout.classifier.sgd.TrainLogistic
--input donut.csv --output donut.model --target color --categories 2
--predictors x y a b c --types numeric --features 20 --passes 100 --rate 50
20
color ~ 7.068*Intercept Term + 0.581*a + -1.369*b + -25.059*c + 0.581*x +
2.319*y
      Intercept Term 7.06759
                   a 0.58123
                   b -1.36893
                   c -25.05945
                   x 0.58123
                   y 2.31879
    0.000000000     0.000000000     0.000000000     0.000000000
0.000000000    -1.368933989     0.000000000     0.000000000
0.000000000     0.000000000     0.581234210     0.000000000
0.000000000     7.067587159     0.000000000     0.000000000
0.000000000     2.318786209     0.000000000   -25.059452292
12/09/19 11:00:17 INFO driver.MahoutDriver: Program took 2262 ms (Minutes:
0.0377)
[banki@cos 1]$ mahout org.apache.mahout.classifier.sgd.TrainLogistic
--input donut.csv --output donut.model --target color --categories 2
--predictors x y a b c --types numeric --features 20 --passes 100 --rate 40
20
color ~ 5.882*Intercept Term + 0.445*a + -1.107*b + -20.912*c + 0.445*x +
1.855*y
      Intercept Term 5.88183
                   a 0.44521
                   b -1.10685
                   c -20.91159
                   x 0.44521
                   y 1.85450
    0.000000000     0.000000000     0.000000000     0.000000000
0.000000000    -1.106846635     0.000000000     0.000000000
0.000000000     0.000000000     0.445207648     0.000000000
0.000000000     5.881825108     0.000000000     0.000000000
0.000000000     1.854504189     0.000000000   -20.911586416
12/09/19 11:00:58 INFO driver.MahoutDriver: Program took 2016 ms (Minutes:
0.0336)
[banki@cos 1]$ mahout org.apache.mahout.classifier.sgd.TrainLogistic
--input donut.csv --output donut.model --target color --categories 2
--predictors x y a b c --types numeric --features 20 --passes 100 --rate 60
20
color ~ 8.320*Intercept Term + 0.705*a + -1.669*b + -29.161*c + 0.705*x +
2.723*y
      Intercept Term 8.31993
                   a 0.70483
                   b -1.66860
                   c -29.16063
                   x 0.70483
                   y 2.72289
    0.000000000     0.000000000     0.000000000     0.000000000
0.000000000    -1.668599735     0.000000000     0.000000000
0.000000000     0.000000000     0.704831781     0.000000000
0.000000000     8.319926323     0.000000000     0.000000000
0.000000000     2.722889944     0.000000000   -29.160634416
12/09/19 11:01:16 INFO driver.MahoutDriver: Program took 2291 ms (Minutes:
0.03818333333333333)
[banki@cos 1]$ mahout org.apache.mahout.classifier.sgd.TrainLogistic
--input donut.csv --output donut.model --target color --categories 2
--predictors x y a b c --types numeric --features 20 --passes 100 --rate 500
20
color ~ 55.909*Intercept Term + 7.925*a + -10.211*b + -197.573*c + 7.925*x
+ 12.743*y
      Intercept Term 55.90868
                   a 7.92520
                   b -10.21115
                   c -197.57275
                   x 7.92520
                   y 12.74325
    0.000000000     0.000000000     0.000000000     0.000000000
0.000000000   -10.211151393     0.000000000     0.000000000
0.000000000     0.000000000     7.925202029     0.000000000
0.000000000    55.908675853     0.000000000     0.000000000
0.000000000    12.743250315     0.000000000  -197.572748252
12/09/19 11:14:23 INFO driver.MahoutDriver: Program took 1742 ms (Minutes:
0.029033333333333335)
[banki@cos 1]$ mahout org.apache.mahout.classifier.sgd.TrainLogistic
--input donut.csv --output donut.model --target color --categories 2
--predictors x y a b c --types numeric --features 20 --passes 100 --rate
50000
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/bin/hadoop and
HADOOP_CONF_DIR=/usr/lib/hadoop/conf
MAHOUT-JOB: /opt/mahout/mahout-examples-0.7-job.jar
12/09/19 11:17:22 WARN driver.MahoutDriver: No
org.apache.mahout.classifier.sgd.TrainLogistic.props found on classpath,
will use command-line arguments only
20
color ~ 5588.511*Intercept Term + 240.624*a + -207.160*b + -19609.709*c +
240.624*x + 1858.155*y
      Intercept Term 5588.51071
                   a 240.62409
                   b -207.16022
                   c -19609.70869
                   x 240.62409
                   y 1858.15547
    0.000000000     0.000000000     0.000000000     0.000000000
0.000000000  -207.160221372     0.000000000     0.000000000
0.000000000     0.000000000   240.624090101     0.000000000
0.000000000  5588.510709572     0.000000000     0.000000000
0.000000000  1858.155468135     0.000000000 -19609.708690329
12/09/19 11:17:24 INFO driver.MahoutDriver: Program took 2135 ms (Minutes:
0.035583333333333335)
So, the coefficients changes almost by the same multiplier I use for
various learning rates.
How can it be so, when the cofficients found by model must povide the
extremum of the likelihood function?

On the other dataset I use in trying to understand the impact of rate
parameter I see EXACT multiplication, i. e. when I change the rate
parameter decreasing it by 10 times, ALL coefficients change exactly by 10
times decrease. What does it mean? What coefficients can be taken as
maximizing the likelihood function? Why the algorithm shows no signs
of "stability of solution"?

I would greatly appreciate your help in any explanation of how cli command
org.apache.mahout.classifier.sgd.RunLogistic uses the learning rate
parameter.

Kind regards,
Nikita Kuznetsov

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message