mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Filozov <>
Subject Re: Generic approach to kNN
Date Mon, 10 Oct 2011 23:54:08 GMT
I have a set of feature vectors. They're composed of integers and other
non-numerical values. This means that I would need the ability to supply my
own distance function. My data has no notion of users, just vectors.


vector 1: (1, apple, dog, 34, 8766)
vector n: (3, orange, cat, 3738, 3737)

I would like to know if Mahout can perform kNN similarity search using such
arbitrary items/vectors. As a side question, can it  perform that outside
the context of a recommender? I think reducing some problems to a
recommendation may a bit awkward.

On Monday, October 10, 2011, Sean Owen <> wrote:
> I think there are a lot of answers to this, depending on what exactly
> you want. This is just one answer -- maybe you can clarify your
> requirements.
> You want to just find the k most similar items, and you want to
> construe this as a recommender problem?
> The item-based recommenders have a mostSimilarItems() method. All it
> does is find the k most similar items to the given item. It's just
> applying a given similarity metric to search all possibilities. It
> works on "items" but you can flip it around to work on users if you
> like.
> Vectors really have to take on numeric values, or else they're not
> really vectors! Are you trying to map discrete values to some numeric
> range?
> On Mon, Oct 10, 2011 at 8:26 PM, Felix Filozov <> wrote:
>> I would like perform a kNN similarity search, where each data point is a
>> dimensional vector and each coordinate in the vector may take on any
>> (reals or strings). It seems to me that Mahout doesn't have the ability
>> perform a generic kNN similarity search, instead the problem has to be
>> mapped to a recommender. Is Mahout the right tool for this task?
>> If it is, how have you dealt with the mapping, and if not, what would you
>> recommend?
>> Thanks.
>> Felix

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message