mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: SVD and Clustering
Date Tue, 06 Jul 2010 06:16:59 GMT
In my own experience, things like graphs (including bipartite graphs like
ratings matrices) I normalized before *and* after, but text I don't (unit)
normalize before, but do normalize after.

The reasoning I use is that normalizing the rows of graphs has
a meaning in the context of the graph (you're doing the PageRank-like
thing of normalizing outflowing probability when looking at random
walks, for example, or for ratings matrices, you're saying that
everyone gets "one vote" to distribute amongst the things they've
rated [these apply for doing L_1 normalization, which isn't always
appropriate]), while I don't know if I buy the similar description of
what pre-normalizing the rows of a text corpus.

  -jake


On Tue, Jul 6, 2010 at 1:08 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> On Mon, Jul 5, 2010 at 12:34 PM, Grant Ingersoll <gsingers@apache.org
> >wrote:
>
> >
> > On Jul 5, 2010, at 1:17 PM, Ted Dunning wrote:
> >
> > > Yes to this.
> > >
> > > On Mon, Jul 5, 2010 at 6:43 AM, Grant Ingersoll <gsingers@apache.org>
> > wrote:
> > >
> > >> is it just seen as a general way of doing feature reduction and
> > therefore
> > >> it makes sense to do.
> >
> > Should I normalize my vectors before doing SVD or after or not at all?
>
>
> Yes.  :-)
>
> Any of these can help.  Normalizing before will probably not have a huge
> effect, but could be helpful if you have certain kinds of odd documents.
>  Normalizing document vectors after SVD may be critical to avoid problems
> with eigenspokes.  Avoiding normalization is important in certain other
> situations.
>
> So the answer to your two binary questions expressed as four possible
> options is "Yes".
>
> Try it and apply the laugh test to each option.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message