spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reth RM <reth.ik...@gmail.com>
Subject KMean clustering resulting Skewed Issue
Date Sat, 25 Mar 2017 01:37:47 GMT
Hi,

  I am using spark k mean for clustering records that consist of news
documents, vectors are created by applying tf-idf. Dataset that I am using
for testing right now is the gold-truth classified
http://qwone.com/~jason/20Newsgroups/

Issue is all the documents are getting assigned to same cluster and others
just have the vector(doc) picked as cluster center(skewed clustering). What
could be the possible reasons for the issue, any suggestions? Should I be
retuning the epsilon?

Mime
View raw message