Yes, without the “amounts” variables the results are similiar. When I put other variables
its fine.
De: Sean Owen [mailto:sowen@cloudera.com]
Enviado el: jueves, 18 de diciembre de 2014 14:22
Para: Franco Barrientos
CC: user@spark.apache.org
Asunto: Re: Effects problems in logistic regression
Are you sure this is an applestoapples comparison? for example does your SAS process normalize
or otherwise transform the data first?
Is the optimization configured similarly in both cases  same regularization, etc.?
Are you sure you are pulling out the intercept correctly? It is a separate value from the
logistic regression model in Spark.
On Thu, Dec 18, 2014 at 4:34 PM, Franco Barrientos <franco.barrientos@exalitica.com <mailto:franco.barrientos@exalitica.com>
> wrote:
Hi all!,
I have a problem with LogisticRegressionWithSGD, when I train a data set with one variable
(wich is a amount of an item) and intercept, I get weights of
(0.4021,207.1749) for both features, respectively. This don´t make sense to me because
I run a logistic regression for the same data in SAS and I get these weights (2.6604,0.000245).
The rank of this variable is from 0 to 59102 with a mean of 1158.
The problem is when I want to calculate the probabilities for each user from data set, this
probability is near to zero or zero in much cases, because when spark calculates exp(1*(0.4021+(207.1749)*amount))
this is a big number, in fact infinity for spark.
How can I treat this variable? or why this happened?
Thanks ,
Franco Barrientos
Data Scientist
Málaga #115, Of. 1003, Las Condes.
Santiago, Chile.
(+562)29699649 <tel:%28%2B562%2929699649>
(+569)76347893 <tel:%28%2B569%2976347893>
franco.barrientos@exalitica.com <mailto:franco.barrientos@exalitica.com>
www.exalitica.com <http://www.exalitica.com/>
<http://exalitica.com/web/img/frim.png>
