spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hokam chauhan <hokam.1...@gmail.com>
Subject How to handle categorical variables in Spark MLlib?
Date Tue, 22 Dec 2015 17:12:11 GMT
Hi,

We have one use case in which we need to handle the categorical variables in
SVM, Regression and Logistic regression models(MLlib not ML) for scoring.

We are getting the possible category values against each category variable. 

So how the string value of categorical variable can be converted into double
values for forming the features vector ?

Also how the weight for individual categories can be calculated for models.
Like we have Gender as variable with categories as Male and Female and we
want to give more weight to female category, then how this can be
accomplished?  

Also is there a way through which string values from raw text can be
converted to features vector(Apart from the HashingTF-IDF transformation) ?

-- 

Thanks and Regards,
Hokam Singh Chauhan

Mobile : 09407125190



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-handle-categorical-variables-in-Spark-MLlib-tp25767.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message