mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Clustering of text data on external categories
Date Fri, 11 Oct 2013 15:13:08 GMT
Search engines do cool things.


On Fri, Oct 11, 2013 at 7:42 AM, Jens Bonerz <jbonerz@googlemail.com> wrote:

> what a nice idea :-) really like that approach
>
>
> 2013/10/11 Ted Dunning <ted.dunning@gmail.com>
>
> > You don't need Mahout for this.
> >
> > A very easy way to do this is to gather all the words for each category
> > into a document.  Thus:
> >
> > CatA:selling buying sales payment
> > CatB:gathering collecting
> > CatC:information data info
> >
> > Then put these into a text retrieval engine so that you have one document
> > per category.
> >
> > When you get a new document to categorize, just use the document as a
> query
> > and you will get a list of possible categories back.  Make sure you set
> the
> > default query mode to OR for this.
> >
> > See http://wiki.apache.org/solr/SolrQuerySyntax for more on the syntax.
> >
> >
> >
> > On Fri, Oct 11, 2013 at 5:04 AM, Kasi Subrahmanyam
> > <kasisubbu440@gmail.com>wrote:
> >
> > > Hi,
> > >
> > > I have a problem that i would like to implement in mahout clustering.
> > >
> > > I have input text documents with data like below.
> > >
> > > Document1: This is the first document of selling information.
> > > Document2: This is the second document of gathering information.
> > >
> > > I also have another look up file with data like below
> > > selling:CatA
> > > gathering:CatB.
> > > information:CatC
> > >
> > > NOw i would like to cluster the documents with output being genrated as
> > > Document1:CatA,CatC
> > > Document2:CatB,CatC
> > >
> > > Please let me know how to achieve this.
> > >
> > > Thanks,
> > > Subbu
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message