spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Chaddha <>
Subject Machine learning question (suing spark)- removing redundant factors while doing clustering
Date Mon, 08 Aug 2016 11:42:22 GMT
I have a data-set where each data-point has 112 factors.

I want to remove the factors which are not relevant, and say reduce to 20
factors out of these 112 and then do clustering of data-points using these
20 factors.

How do I do these and how do I figure out which of the 20 factors are
useful for analysis.

I see SVD and PCA implementations, but I am not sure if these give which
elements are removed and which are remaining.

Can someone please help me understand what to do here


View raw message