mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Filozov <ffilo...@gmail.com>
Subject Re: Generic approach to kNN
Date Thu, 13 Oct 2011 05:32:52 GMT
I decided that since I don't have as much data as I thought I would have, I
would simply choose an optimized data structure to hold my data set, which
I'd query locally.

I did start looking into distributed NN options and ways of parallelizing NN
as well.

Thanks for all the help.

On Wed, Oct 12, 2011 at 11:26 PM, Josh Patterson <josh@cloudera.com> wrote:

> Without knowing a lot about what you are doing, I'd say you could just
> do this rather simply as Sean has said with a basic similarity
> function;
>
> The really simple "batch" version of this might be:
>
> 1. Define similarity function
> 2. Input of some sort of "base point / instance" which we'll use to
> search against
> 3. the map side of the MR job just takes each input vector and scores
> it with the distance function
> 4. output using the total order partitioner, sorting on distance score
> 5. look at the first k entries on the front end of the thing
>
> A more complicated option might be something along the lines of "MD-tree":
>
> http://www.cs.ucsb.edu/~sudipto/papers/md-hbase.pdf
>
> where they are storing a k-d tree in HBase to give relatively low
> latency kNN search queries.
>
> The batch version seems like it might be a nice place to start.
>
> Hope this helps,
>
> JP
>
>
> On Mon, Oct 10, 2011 at 3:26 PM, Felix Filozov <ffilozov@gmail.com> wrote:
> > I would like perform a kNN similarity search, where each data point is a
> N
> > dimensional vector and each coordinate in the vector may take on any
> value
> > (reals or strings). It seems to me that Mahout doesn't have the ability
> to
> > perform a generic kNN similarity search, instead the problem has to be
> > mapped to a recommender. Is Mahout the right tool for this task?
> >
> > If it is, how have you dealt with the mapping, and if not, what would you
> > recommend?
> >
> > Thanks.
> >
> > Felix
> >
>
>
>
> --
> Twitter: @jpatanooga
> Solution Architect @ Cloudera
> hadoop: http://www.cloudera.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message