mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Logistic Regression in Mahout
Date Thu, 31 Jan 2013 17:11:09 GMT
Here are few notes:

- TrainLogistic uses OnlineLogisticRegression which uses L1 regularization.
 You don't say what you are using in R, but I would assume
glm(family="binomial") or equivalent.

Is this correct?

- I don't think that there is a log issue here.

- can you share the data off-list so that I can debug this?

On Wed, Jan 30, 2013 at 7:15 PM, Prabhu <prabhu@mediaiqdigital.com> wrote:

> Thanks, I thought of that, but that doesn't seem to be the right
> explanation
> either
> For one, in the output I see the equation like
> TargetVariable ~ -0.001*InterceptTerm + - 0.0006*predictor1 +
> -0.0004*predictor2 ....
>
> Also if I look at the say predictor1, the co-efficient in R is 1.02 and for
> predictor2 is 0.48 whereas in Mahout, I get -0.00063 for predictor1 and
> -0.00042 for predictor2. Now if these values are logs of what I am looking
> for, e^ -0.00063 is 0.999937 and e^ -0.00042 is 0.99958, so the difference
> is marginal, whereas R co-efficients indicate predictor1 has much higher
> weightage compared to predictor2 which is what I would expect.
>
> Any other thoughts, ideas?
>
> Thanks
> Prabhu
>
> -----Original Message-----
> From: Jake Mannix [mailto:jake.mannix@gmail.com]
> Sent: 31 January 2013 04:54
> To: user@mahout.apache.org
> Subject: Re: Logistic Regression in Mahout
>
> Looks like you're looking at weights which are logs of the weights you
> think
> you want.
>
>
> On Wed, Jan 30, 2013 at 4:12 AM, Prabhu <prabhu@mediaiqdigital.com> wrote:
>
> > Hi all,
> >
> >     I am trying to use Mahout to run logistic regression analysis on
> > some data. The data is about 7 Million rows, with about 20 predictor
> > variables (all of them numeric).  The target variable is Boolean - 0 or
> 1.
> >
> > I run a logistic regression with this data on R and I get good
> > co-efficients which makes sense. But when I  run a logistic regression
> > on the exact same data using Mahout, I get co-efficients that don't
> > make sense. For a start, all co-efficients are negative. The
> > interesting thing is that the co-efficient (from R) for the most
> > important variable (with highest
> > co-efficient) has the least negative value in Mahout. Can someone
> > please help me understand what the cause of the problem is?
> >
> >
> >
> > Thanks
> >
> > Prabhu
> >
> >
> >
> >
>
>
> --
>
>   -jake
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message