spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From unk1102 <>
Subject Spark MLLlib Ideal way to convert categorical features into LabeledPoint RDD?
Date Mon, 01 Feb 2016 17:21:38 GMT
Hi I have dataset which is completely categorical and it does not contain
even one column as numerical. Now I want to apply classification using Naive
Bayes I have to predict whether given alert is actionable or not using
YES/NO I have the following example of my dataset

0,Network1,App1,Router1,Not reachable,YES
0,Network1,App2,Router5,Not reachable,NO

I am using Spark 1.6 and I see there is StringIndexer class which is used
OneHotEncoding example given here but I
have almost 10000 unique words/features to map into continuous how do I
create such a huge map. I have my dataset in csv file please guide me how do
I convert my all the categorical features in csv file and use it in naive
bayes model.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message