I have a set of feature vectors. They're composed of integers and other
nonnumerical values. This means that I would need the ability to supply my
own distance function. My data has no notion of users, just vectors.
Example:
vector 1: (1, apple, dog, 34, 8766)
...
vector n: (3, orange, cat, 3738, 3737)
I would like to know if Mahout can perform kNN similarity search using such
arbitrary items/vectors. As a side question, can it perform that outside
the context of a recommender? I think reducing some problems to a
recommendation may a bit awkward.
On Monday, October 10, 2011, Sean Owen <srowen@gmail.com> wrote:
> I think there are a lot of answers to this, depending on what exactly
> you want. This is just one answer  maybe you can clarify your
> requirements.
>
> You want to just find the k most similar items, and you want to
> construe this as a recommender problem?
> The itembased recommenders have a mostSimilarItems() method. All it
> does is find the k most similar items to the given item. It's just
> applying a given similarity metric to search all possibilities. It
> works on "items" but you can flip it around to work on users if you
> like.
>
> Vectors really have to take on numeric values, or else they're not
> really vectors! Are you trying to map discrete values to some numeric
> range?
>
>
> On Mon, Oct 10, 2011 at 8:26 PM, Felix Filozov <ffilozov@gmail.com> wrote:
>> I would like perform a kNN similarity search, where each data point is a
N
>> dimensional vector and each coordinate in the vector may take on any
value
>> (reals or strings). It seems to me that Mahout doesn't have the ability
to
>> perform a generic kNN similarity search, instead the problem has to be
>> mapped to a recommender. Is Mahout the right tool for this task?
>>
>> If it is, how have you dealt with the mapping, and if not, what would you
>> recommend?
>>
>> Thanks.
>>
>> Felix
>>
>
