spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filipp Zhinkin <>
Subject [ML] LogisticRegression and dataset's standardization before training
Date Wed, 06 Dec 2017 10:13:22 GMT

LogisticAggregator [1] scales every sample on every iteration. Without
scaling binaryUpdateInPlace could be rewritten using and that
would significantly improve performance.
However, there is a comment [2] saying that standardization and
caching of the dataset before training will "create a lot of

What kind of overhead it is all about and what is rationale to avoid
scaling dataset prior training?



To unsubscribe e-mail:

View raw message