mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Bonerz <jbon...@googlemail.com>
Subject Re: Clustering of text data on external categories
Date Fri, 11 Oct 2013 14:42:56 GMT
what a nice idea :-) really like that approach


2013/10/11 Ted Dunning <ted.dunning@gmail.com>

> You don't need Mahout for this.
>
> A very easy way to do this is to gather all the words for each category
> into a document.  Thus:
>
> CatA:selling buying sales payment
> CatB:gathering collecting
> CatC:information data info
>
> Then put these into a text retrieval engine so that you have one document
> per category.
>
> When you get a new document to categorize, just use the document as a query
> and you will get a list of possible categories back.  Make sure you set the
> default query mode to OR for this.
>
> See http://wiki.apache.org/solr/SolrQuerySyntax for more on the syntax.
>
>
>
> On Fri, Oct 11, 2013 at 5:04 AM, Kasi Subrahmanyam
> <kasisubbu440@gmail.com>wrote:
>
> > Hi,
> >
> > I have a problem that i would like to implement in mahout clustering.
> >
> > I have input text documents with data like below.
> >
> > Document1: This is the first document of selling information.
> > Document2: This is the second document of gathering information.
> >
> > I also have another look up file with data like below
> > selling:CatA
> > gathering:CatB.
> > information:CatC
> >
> > NOw i would like to cluster the documents with output being genrated as
> > Document1:CatA,CatC
> > Document2:CatB,CatC
> >
> > Please let me know how to achieve this.
> >
> > Thanks,
> > Subbu
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message