spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Sharkey <nicholasshar...@gmail.com>
Subject Re: Spark ML : One hot Encoding for multiple columns
Date Mon, 14 Nov 2016 01:00:08 GMT
Amen 

> On Nov 13, 2016, at 7:55 PM, janardhan shetty <janardhanp22@gmail.com> wrote:
> 
> These Jiras'  are still unresolved:
> https://issues.apache.org/jira/browse/SPARK-11215
> 
> Also there is https://issues.apache.org/jira/browse/SPARK-8418
> 
>> On Wed, Aug 17, 2016 at 11:15 AM, Nisha Muktewar <nisha@cloudera.com> wrote:
>> 
>> The OneHotEncoder does not accept multiple columns.
>> 
>> You can use Michal's suggestion where he uses Pipeline to set the stages and then
executes them. 
>> 
>> The other option is to write a function that performs one hot encoding on a column
and returns a dataframe with the encoded column and then call it multiple times for the rest
of the columns.
>> 
>> 
>> 
>> 
>>> On Wed, Aug 17, 2016 at 10:59 AM, janardhan shetty <janardhanp22@gmail.com>
wrote:
>>> I had already tried this way :
>>> 
>>> scala> val featureCols = Array("category","newone")
>>> featureCols: Array[String] = Array(category, newone)
>>> 
>>> scala>  val indexer = new StringIndexer().setInputCol(featureCols).setOutputCol("categoryIndex").fit(df1)
>>> <console>:29: error: type mismatch;
>>>  found   : Array[String]
>>>  required: String
>>>         val indexer = new StringIndexer().setInputCol(featureCols).setOutputCol("categoryIndex").fit(df1)
>>> 
>>> 
>>>> On Wed, Aug 17, 2016 at 10:56 AM, Nisha Muktewar <nisha@cloudera.com>
wrote:
>>>> I don't think it does. From the documentation: https://spark.apache.org/docs/2.0.0-preview/ml-features.html#onehotencoder,
I see that it still accepts one column at a time.
>>>> 
>>>>> On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty <janardhanp22@gmail.com>
wrote:
>>>>> 2.0:
>>>>> 
>>>>> One hot encoding currently accepts single input column is there a way
to include multiple columns ?
>>>> 
>>> 
>> 
> 

Mime
View raw message