mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From arindam chakraborty <>
Subject Can clustering answer these questions
Date Sun, 12 Aug 2012 17:28:48 GMT
I am considering clustering (Canopy or k-means) to build a recommender but
I have following uncertainties. If someone can please clarify them, it will
be really helpful.

My vector will be points of 8-dimensions. I will expect the clustering
phase to group close points in respective clusters. The output is where I
am stuck, as to how I can interpret them

   1. Since main aim is to recommend similar objects, assumption is that
   points in the same cluster will be similar. So Is there a RECOMMENDER based
   on the clustering output, or I would have to build that logic manually
   2. Since output will have a list of vectors in one cluster (and they
   will not be unique) how do I identify them. i.e., which resulting point
   means which object, so that I know Object A, B, C are in the same cluster
   or not.
   3. For a new object P, is there a way to find out its cluster, or I will
   have to re-build the clusters all over again
   4. In a cluster, say I do identify an object P somehow, how can I figure
   out the closest n points to it. Is there any built-in method or I would
   have to write my own implementation
   5. Can I provide a data source like a DB to the cluster, so that it can
   work on the changed rows to fit them in their respective clusters. Or I
   would have to rebuild the clusters
   6. Can an object O be added to a cluster in real time? Can I find out
   its closest points from the cluster in real time. [SIMILAR TO POINT 3 & 4 ]
   7. Does the cluster need to be rebuilt on every addition to my source
   data? Or it can identify the delta, and readjust it. Is there a refresh()
   method as there are for Recommenders?

If you can answer one or more questions, it would be very useful.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message