spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Valeriy Avanesov <>
Subject Re: [MLLib] Logistic Regression and standadization
Date Fri, 20 Apr 2018 17:06:52 GMT
Hi all.

Filipp, do you use l1/l2/elstic-net penalization? I believe in this case 
standardization matters.



On 04/17/2018 11:40 AM, Weichen Xu wrote:
> Not a bug.
> When disabling standadization, mllib LR will still do standadization 
> for features, but it will scale the coefficients back at the end 
> (after training finished). So it will get the same result with no 
> standadization training. The purpose of it is to improve the rate of 
> convergence. So the result should be always exactly the same with 
> R's glmnet, no matter enable or disable standadization.
> Thanks!
> On Sat, Apr 14, 2018 at 2:21 AM, Yanbo Liang < 
> <>> wrote:
>     Hi Filipp,
>     MLlib’s LR implementation did the same way as R’s glmnet for
>     standardization.
>     Actually you don’t need to care about the implementation detail,
>     as the coefficients are always returned on the original scale, so
>     it should be return the same result as other popular ML libraries.
>     Could you point me where glmnet doesn’t scale features?
>     I suspect other issues cause your prediction quality dropped. If
>     you can share the code and data, I can help to check it.
>     Thanks
>     Yanbo
>>     On Apr 8, 2018, at 1:09 PM, Filipp Zhinkin
>>     < <>> wrote:
>>     Hi all,
>>     While migrating from custom LR implementation to MLLib's LR
>>     implementation my colleagues noticed that prediction quality
>>     dropped (accoring to different business metrics).
>>     It's turned out that this issue caused by features
>>     standardization perfomed by MLLib's LR: disregard to
>>     'standardization' option's value all features are scaled during
>>     loss and gradient computation (as well as in few other places):
>>     <>
>>     According to comments in the code, standardization should be
>>     implemented the same way it was implementes in R's glmnet
>>     package. I've looked through corresponding Fortran code, an it
>>     seems like glmnet don't scale features when you're disabling
>>     standardisation (but MLLib still does).
>>     Our models contains multiple one-hot encoded features and scaling
>>     them is a pretty bad idea.
>>     Why MLLib's LR always scale all features? From my POV it's a bug.
>>     Thanks in advance,
>>     Filipp.

View raw message