spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Pentreath <nick.pentre...@gmail.com>
Subject Re: Reverse MinMaxScaler in SparkML
Date Mon, 29 Jan 2018 14:33:34 GMT
This would be interesting and a good addition I think.

It bears some thought about the API though. One approach is to have an
"inverseTransform" method similar to sklearn.

The other approach is to "formalize" something like StringIndexerModel ->
IndexToString. Here, the inverse transformer is a standalone transformer.
It could be returned from a "getInverseTransformer" method, for example.

The former approach is simpler, but cannot be used in pipelines (which work
on "fit" / "transform"). The latter approach is more cumbersome, but fits
better into pipelines.

So it depends on the use cases - i.e. how common is it to use the inverse
transform function within a pipeline (for StringIndexer <-> IndexToString
it is quite common to get back the labels, while for other transformers it
may or may not be).

On Mon, 8 Jan 2018 at 11:10 Tomasz Dudek <megatrontomaszdudek@gmail.com>
wrote:

> Hello,
>
> since the similar question on StackOverflow remains unanswered (
> https://stackoverflow.com/questions/46092114/is-there-no-inverse-transform-method-for-a-scaler-like-minmaxscaler-in-spark
> ) and perhaps there is a solution that I am not aware of, I'll ask:
>
> After traning MinMaxScaler(or similar scaler) is there any built-in way to
> revert the process? What I mean is to transform the scaled data back to its
> original form. SKlearn has a dedicated method inverse_transform that does
> exactly that.
>
> I can, of course, get the originalMin/originalMax Vectors from the
> MinMaxScalerModel and then map the values myself but it would be nice to
> have it built-in.
>
> Yours,
> Tomasz
>
>

Mime
View raw message