mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Perfect neighbours
Date Mon, 01 Jun 2009 22:22:39 GMT
That is a form of dimensionality reduction, but that is not what is normally
meant by this.  Normally, dimensionality reduction is done by transforming
the input space (dimension = number of items considered) to an abstract
representation that has far fewer variables (usually 10-300, down from 10's
of thousands to millions in the original case).

This can be done using some form of linear transformation (AKA singular
value decomposition AKA latent semantic analysis) or non-linear
transformation (AKA bottleneck network AKA autoencoder) or mixture
distribution (including, inter alia, latent Dirichlet allocation, PLSA,
radial basis functions).

Dimensionality reduction can be really, really bad for runtimes because you
suddenly have a dense matrix problem instead of a sparse matrix problem.
That means that search becomes exhaustive search because you can't use
indexes any more.

You can get around that by reconstructing an item-item mapping using your
reduced dimensional representation and truncating small results.  This gives
you a (pretty) sparse result that is amenable to indexing and such.  At this
point, you can use indexed retrieval methods such as Lucene to do
recommendation in one step.

The contrast is with nearest neighbor methods which require two retrievals
(first users, then items).  The two retrievals aren't just twice the cost
because there are often many more users than there are items.  The nearest
neighbor methods also cannot easily make use of global information exposed
by the dimensionality reduction, nor can they use external information such
as that derived by mixed task learning.

On Mon, Jun 1, 2009 at 2:31 PM, Sean Owen <> wrote:

> I think Otis is doing a sort of form of 'dimension reduction' already
> here, by removing items from consideration that don't add much
> information. That's kind of the same thing in this simplified scenario
> where we don't have rating vectors really, just seen/not seen values.
> Indeed it speeds things up a lot. I am under the impression Otis needs
> real-time recommendations in his case.

Ted Dunning, CTO

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message