mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Eigenspokes & clustering question (in continuation of mahout 12/07 meetup discussion)
Date Thu, 09 Dec 2010 23:24:29 GMT
Yes.  Projection onto the sphere helps.  Doing this to a sequence file full
of vectors should be pretty easy since you just have to do v.normalize(2).

But no.  The fundamental problems with eigenspokes has a lot to do with
small counts and excessive weighting of coincidence.  To fix that you really
need to go to a probabilistic project method like LDA.

On Thu, Dec 9, 2010 at 1:54 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

> Hi everyone.
>
> i was thinking about eigenspokes problem. Actually briefly looked thru one
> paper about it.
>
>
> We basically said cluster detection doesn't work well on them. But it would
> seem to me that's just a matter of geometrical convenience. if we convert U
> stuff into hyperspherical vectors (and exclude the second norm from it),
> shouldn't that representation actually have very nice centroids?
>
> Or i am missing something fundamental here?
>
> But if that solves the problem, then it looks like we could have a
> preprocessor for clustering algorithms converting SVD output into
> hyperspherical vectors. so this basically would allow to run clustering
> after dimensionality reduction (and there's another reason why i wanted to
> do that but that's another discussion's subject).
>
> Thanks.
> -Dmitriy
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message