mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Ortega <jorte...@gmail.com>
Subject Re: Can clustering answer these questions
Date Mon, 13 Aug 2012 09:04:30 GMT
For questions 1 and 2 you might want to look at
https://cwiki.apache.org/MAHOUT/quick-tour-of-text-analysis-using-the-mahout-command-line.html,
specifically the rowid and rowsimilarity jobs

On Sun, Aug 12, 2012 at 7:28 PM, arindam chakraborty
<arismart99@gmail.com>wrote:

> I am considering clustering (Canopy or k-means) to build a recommender but
> I have following uncertainties. If someone can please clarify them, it will
> be really helpful.
>
> My vector will be points of 8-dimensions. I will expect the clustering
> phase to group close points in respective clusters. The output is where I
> am stuck, as to how I can interpret them
>
>
>    1. Since main aim is to recommend similar objects, assumption is that
>    points in the same cluster will be similar. So Is there a RECOMMENDER
> based
>    on the clustering output, or I would have to build that logic manually
>    2. Since output will have a list of vectors in one cluster (and they
>    will not be unique) how do I identify them. i.e., which resulting point
>    means which object, so that I know Object A, B, C are in the same
> cluster
>    or not.
>    3. For a new object P, is there a way to find out its cluster, or I will
>    have to re-build the clusters all over again
>    4. In a cluster, say I do identify an object P somehow, how can I figure
>    out the closest n points to it. Is there any built-in method or I would
>    have to write my own implementation
>    5. Can I provide a data source like a DB to the cluster, so that it can
>    work on the changed rows to fit them in their respective clusters. Or I
>    would have to rebuild the clusters
>    6. Can an object O be added to a cluster in real time? Can I find out
>    its closest points from the cluster in real time. [SIMILAR TO POINT 3 &
> 4 ]
>    7. Does the cluster need to be rebuilt on every addition to my source
>    data? Or it can identify the delta, and readjust it. Is there a
> refresh()
>    method as there are for Recommenders?
>
>
> If you can answer one or more questions, it would be very useful.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message