spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filipp Zhinkin <filipp.zhin...@gmail.com>
Subject Re: ML Transformer: create feature that uses multiple columns
Date Sat, 09 Dec 2017 11:48:27 GMT
Hi,

you can combine multiple columns using
org.apache.spark.sql.functions.struct and invoke UDF on resulting
column.
In that case your UDF have to accept Row as an argument.

See VectorAssermber's sources for example:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala#L109

Regards,
Filipp.

On Sat, Dec 9, 2017 at 2:41 PM, davideanastasia
<davide.anastasia@gmail.com> wrote:
> Hi,
> I am trying to write a custom ml.Transformer. It's a very simple row-by-row
> transformation, but it takes in account multiple columns of the DataFrame
> (and sometimes, interaction between columns).
>
> I was wondering what the best way to achieve this is. I have used a udf in
> the Transformer before, but that only allows me to use one column (am I
> right?). How can I use multiple columns?
>
> Thanks,
> D.
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message