mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Using clustering output for classification
Date Tue, 06 May 2014 13:32:58 GMT
I think Peng is right.  It might help to amplify a bit.

The idea is that in addition to the other predictor variables that you
have, there is also one predictor variable per cluster.  Whichever cluster
is closest to the training example is turned on.

On Wikipedia, the term used is "one hot" encoding.

http://en.wikipedia.org/wiki/One-hot




On Tue, May 6, 2014 at 4:02 AM, Peng Zhang <pzhang.xjtu@gmail.com> wrote:

> Angel,
>
> I thinks Ted means each example falls into one cluster. If you have k
> clusters, and each example should have one of the encodings: 1,2,…k.
>
> On May 6, 2014, at 5:27 AM, Angel Luis Scull <ascullp@facinf.uho.edu.cu>
> wrote:
>
> > What do you mean with "get a 1 of n encodings..."
> >
> > On 05/05/14 16:59, Ted Dunning wrote:
> >> In theory, what you need to do is take your training data for your
> >> classifier and run your clustering to get a 1 of n encoding of the
> cluster
> >> for each example in the training data.
> >>
> >> Then train the classifier using original and new features.
> >>
> >> Does that help?  I have a simple demo of the process in R that I do if
> that
> >> would help.
> >>
> >>
> >>
> >>
> >> On Mon, May 5, 2014 at 5:53 PM, Angel Luis Scull
> >> <ascullp@facinf.uho.edu.cu>wrote:
> >>
> >>> Hello to all
> >>>
> >>> I've a document dataset that I applied kmeans over it an obtained a
> >>> clusters, now I want to use this the association of the vectors and
> >>> clusters as input for a classification algorithm.
> >>>
> >>> How can I achieve that?
> >>>
> >>> thanks in advance
> >>>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message