spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yanbo Liang <>
Subject Re: [MLLib] Logistic Regression and standadization
Date Fri, 13 Apr 2018 18:21:39 GMT
Hi Filipp,

MLlib’s LR implementation did the same way as R’s glmnet for standardization. 
Actually you don’t need to care about the implementation detail, as the coefficients are
always returned on the original scale, so it should be return the same result as other popular
ML libraries.
Could you point me where glmnet doesn’t scale features? 
I suspect other issues cause your prediction quality dropped. If you can share the code and
data, I can help to check it.


> On Apr 8, 2018, at 1:09 PM, Filipp Zhinkin <> wrote:
> Hi all,
> While migrating from custom LR implementation to MLLib's LR implementation my colleagues
noticed that prediction quality dropped (accoring to different business metrics).
> It's turned out that this issue caused by features standardization perfomed by MLLib's
LR: disregard to 'standardization' option's value all features are scaled during loss and
gradient computation (as well as in few other places):
> According to comments in the code, standardization should be implemented the same way
it was implementes in R's glmnet package. I've looked through corresponding Fortran code,
an it seems like glmnet don't scale features when you're disabling standardisation (but MLLib
still does).
> Our models contains multiple one-hot encoded features and scaling them is a pretty bad
> Why MLLib's LR always scale all features? From my POV it's a bug.
> Thanks in advance,
> Filipp.

View raw message