spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Franco Barrientos" <>
Subject RE: Effects problems in logistic regression
Date Thu, 18 Dec 2014 18:50:10 GMT
Yes, without the “amounts” variables the results are similiar. When I put other variables
its fine.


De: Sean Owen [] 
Enviado el: jueves, 18 de diciembre de 2014 14:22
Para: Franco Barrientos
Asunto: Re: Effects problems in logistic regression


Are you sure this is an apples-to-apples comparison? for example does your SAS process normalize
or otherwise transform the data first? 


Is the optimization configured similarly in both cases -- same regularization, etc.?


Are you sure you are pulling out the intercept correctly? It is a separate value from the
logistic regression model in Spark.


On Thu, Dec 18, 2014 at 4:34 PM, Franco Barrientos < <>
> wrote:

Hi all!,


I have a problem with LogisticRegressionWithSGD, when I train a data set with one variable
(wich is a amount of an item) and intercept, I get weights of

(-0.4021,-207.1749) for both features, respectively. This don´t make sense to me because
I run a logistic regression for the same data in SAS and I get these weights (-2.6604,0.000245).


The rank of this variable is from 0 to 59102 with a mean of 1158.


The problem is when I want to calculate the probabilities for each user from data set, this
probability is near to zero or zero in much cases, because when spark calculates exp(-1*(-0.4021+(-207.1749)*amount))
this is a big number, in fact infinity for spark.


How can I treat this variable? or why this happened? 


Thanks ,


Franco Barrientos
Data Scientist

Málaga #115, Of. 1003, Las Condes.
Santiago, Chile.
(+562)-29699649 <tel:%28%2B562%29-29699649> 
(+569)-76347893 <tel:%28%2B569%29-76347893> <> <> 



View raw message