spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nisha Muktewar <ni...@cloudera.com>
Subject Re: Spark ML : One hot Encoding for multiple columns
Date Wed, 17 Aug 2016 18:15:51 GMT
The OneHotEncoder does *not* accept multiple columns.

You can use Michal's suggestion where he uses Pipeline to set the stages
and then executes them.

The other option is to write a function that performs one hot encoding on a
column and returns a dataframe with the encoded column and then call it
multiple times for the rest of the columns.




On Wed, Aug 17, 2016 at 10:59 AM, janardhan shetty <janardhanp22@gmail.com>
wrote:

> I had already tried this way :
>
> scala> val featureCols = Array("category","newone")
> featureCols: Array[String] = Array(category, newone)
>
> scala>  val indexer = new StringIndexer().setInputCol(
> featureCols).setOutputCol("categoryIndex").fit(df1)
> <console>:29: error: type mismatch;
>  found   : Array[String]
>  required: String
>         val indexer = new StringIndexer().setInputCol(
> featureCols).setOutputCol("categoryIndex").fit(df1)
>
>
> On Wed, Aug 17, 2016 at 10:56 AM, Nisha Muktewar <nisha@cloudera.com>
> wrote:
>
>> I don't think it does. From the documentation:
>> https://spark.apache.org/docs/2.0.0-preview/ml-features.html
>> #onehotencoder, I see that it still accepts one column at a time.
>>
>> On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty <
>> janardhanp22@gmail.com> wrote:
>>
>>> 2.0:
>>>
>>> One hot encoding currently accepts single input column is there a way to
>>> include multiple columns ?
>>>
>>
>>
>

Mime
View raw message