spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Attila Tóth <atez...@gmail.com>
Subject Re: Multidimensional K-Means
Date Sun, 15 Feb 2015 17:53:36 GMT
Hi Sean,

Thanks for the quick answer. I have not realized that I can make an
RDD[Vector] with eg.

val dataSet = sparkContext.makeRDD(List(Vectors.dense(10.0,20.0),
Vectors.dense(20.0,30.0)))

Using this KMeans.train works as it should.

So my bad. Thanks again!

Attila

2015-02-15 17:29 GMT+01:00 Sean Owen <sowen@cloudera.com>:

> Clustering operates on a large number of n-dimensional vectors. That
> seems to be what you are describing, and that is what the MLlib API
> accepts. What are you expecting that you don't find?
>
> Did you have a look at the KMeansModel that this method returns? it
> has a "clusterCenters" method that gives you what you're looking for.
> Explore the API a bit more first.
>
> On Sun, Feb 15, 2015 at 4:26 PM, Attila Tóth <atezs82@gmail.com> wrote:
> > Dear Spark User List,
> >
> > I'm fairly new to Spark, trying to use it for multi-dimensional
> clustering
> > (using the k-means clustering from MLib). However, based on the examples
> the
> > clustering seems to work only for a single dimension (KMeans.train()
> accepts
> > an RDD[Vector], which is a vector of doubles - I have a list of array of
> > doubles, eg. a list of n-dimensional coordinates).
> >
> > Is there any way with which, given a list of arrays (or vectors) of
> doubles,
> > I can get out the list of cluster centres (as a list of n-dimensional
> > coordinates) in Spark?
> >
> > I'm using Scala.
> >
> > Thanks in advance,
> > Attila
>

Mime
View raw message