spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Franco Barrientos" <>
Subject Effects problems in logistic regression
Date Thu, 18 Dec 2014 16:34:57 GMT
Hi all!,


I have a problem with LogisticRegressionWithSGD, when I train a data set
with one variable (wich is a amount of an item) and intercept, I get weights

(-0.4021,-207.1749) for both features, respectively. This don´t make sense
to me because I run a logistic regression for the same data in SAS and I get
these weights (-2.6604,0.000245).


The rank of this variable is from 0 to 59102 with a mean of 1158.


The problem is when I want to calculate the probabilities for each user from
data set, this probability is near to zero or zero in much cases, because
when spark calculates exp(-1*(-0.4021+(-207.1749)*amount)) this is a big
number, in fact infinity for spark.


How can I treat this variable? or why this happened? 


Thanks ,


Franco Barrientos
Data Scientist

Málaga #115, Of. 1003, Las Condes.
Santiago, Chile.





View raw message