spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adamantios Corais <>
Subject How to binarize data in spark
Date Thu, 06 Aug 2015 15:19:14 GMT
I have a set of data based on which I want to create a classification
model. Each row has the following form:

> user1,class1,product2
> user1,class1,product5
> user2,class1,product2
> user2,class1,product5
> user3,class2,product1
> etc

There are about 1M users, 2 classes, and 1M products. What I would like to
do next is create the sparse vectors (something already supported by MLlib)
BUT in order to apply that function I have to create the dense vectors
(with the 0s), first. In other words, I have to binarize my data. What's
the easiest (or most elegant) way of doing that?

*// Adamantios*

View raw message