spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yanbo Liang <yblia...@gmail.com>
Subject Re: How to handle categorical variables in Spark MLlib?
Date Fri, 25 Dec 2015 08:15:46 GMT
Hi Hokam,

You can use OneHotEncoder to encode category variables to feature vector,
Spark ML provide this transformer.
To weight for individual category, there is no exist method to do this, but
you can implement a UDF which can multiple a factor to specified column of
a vector.

Yanbo

2015-12-23 1:12 GMT+08:00 hokam chauhan <hokam.1988@gmail.com>:

> Hi,
>
> We have one use case in which we need to handle the categorical variables
> in
> SVM, Regression and Logistic regression models(MLlib not ML) for scoring.
>
> We are getting the possible category values against each category variable.
>
> So how the string value of categorical variable can be converted into
> double
> values for forming the features vector ?
>
> Also how the weight for individual categories can be calculated for models.
> Like we have Gender as variable with categories as Male and Female and we
> want to give more weight to female category, then how this can be
> accomplished?
>
> Also is there a way through which string values from raw text can be
> converted to features vector(Apart from the HashingTF-IDF transformation) ?
>
> --
>
> Thanks and Regards,
> Hokam Singh Chauhan
>
> Mobile : 09407125190
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-handle-categorical-variables-in-Spark-MLlib-tp25767.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message