mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paritosh Ranjan <>
Subject Re: Can clustering answer these questions
Date Mon, 13 Aug 2012 02:27:05 GMT
I can try to answer few :

1) I don't know.

2) Use org.apache.mahout.math.NamedVector to identify clusters.

3) Yes, new points can be identified without clustering all over again. See

4) I don't think there is any built in implementation for this.

5) AFAIK, clustering algorithms take sequence files as input, there is 
no support for DB.

6) Yes, it is possible. Though you will have to write some code. See 
answer to question 3.

7) No, there is no refresh method sort of thing.


On 12-08-2012 22:58, arindam chakraborty wrote:
> I am considering clustering (Canopy or k-means) to build a recommender but
> I have following uncertainties. If someone can please clarify them, it will
> be really helpful.
> My vector will be points of 8-dimensions. I will expect the clustering
> phase to group close points in respective clusters. The output is where I
> am stuck, as to how I can interpret them
>     1. Since main aim is to recommend similar objects, assumption is that
>     points in the same cluster will be similar. So Is there a RECOMMENDER based
>     on the clustering output, or I would have to build that logic manually
>     2. Since output will have a list of vectors in one cluster (and they
>     will not be unique) how do I identify them. i.e., which resulting point
>     means which object, so that I know Object A, B, C are in the same cluster
>     or not.
>     3. For a new object P, is there a way to find out its cluster, or I will
>     have to re-build the clusters all over again
>     4. In a cluster, say I do identify an object P somehow, how can I figure
>     out the closest n points to it. Is there any built-in method or I would
>     have to write my own implementation
>     5. Can I provide a data source like a DB to the cluster, so that it can
>     work on the changed rows to fit them in their respective clusters. Or I
>     would have to rebuild the clusters
>     6. Can an object O be added to a cluster in real time? Can I find out
>     its closest points from the cluster in real time. [SIMILAR TO POINT 3 & 4 ]
>     7. Does the cluster need to be rebuilt on every addition to my source
>     data? Or it can identify the delta, and readjust it. Is there a refresh()
>     method as there are for Recommenders?
> If you can answer one or more questions, it would be very useful.

View raw message