Your understanding is correct: When used without centering (withMean = false), the 2 implementations are different:
* R: normalize by RMS
* MLlib: normalize by stddev
With centering, they are the same.

It's hard to say which one is better a priori, but my guess is that most R users center their data.  (Centering is nice to do, except on big data where it makes vectors dense.)  Note that R does allow you to normalize by stddev without centering:


On Tue, Jun 2, 2015 at 1:25 AM, RoyGaoVLIS <> wrote:
        When I was trying to add test case for ML’s StandardScaler, I found MLlib’s
StandardScaler’s output different from R with params(withMean false,
withScale true)
        Because columns is divided by root-mean-square rather than standard
deviation in R, the scale function.
        I’ m confused about Spark MLlib’s implementation.
        AnyBody can give me a hand ? thx

View this message in context:
Sent from the Apache Spark Developers List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail: