mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <jeast...@Narus.com>
Subject RE: LDA Mahout
Date Tue, 01 Mar 2011 22:42:39 GMT
Not with seq2sparse. We do have some Lucene support which may allow this. Grant?

-----Original Message-----
From: Manoj Kumar [mailto:manoj1987@gmail.com] 
Sent: Tuesday, March 01, 2011 12:26 PM
To: user@mahout.apache.org
Subject: Re: LDA Mahout

thanks. But is it possible to provide customized stop words list being
loaded from a text file?

Thanks & Regards,
Manoj Kumar.R.K
Graduate Student, MS Computer Science
University at Buffalo
Buffalo, New York
(413) 461-8938|www.rkmanojkumar.co.nr



On Tue, Mar 1, 2011 at 11:36 AM, Jeff Eastman <jeastman@narus.com> wrote:

> Sure, seq2sparse has -maxDFPercent option which can be used to eliminate
> high frequency features like stop words. Check out the documentation at
> https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text
> .
>
> -----Original Message-----
> From: Manoj Kumar [mailto:manoj1987@gmail.com]
> Sent: Monday, February 28, 2011 10:51 PM
> To: user@mahout.apache.org
> Subject: Re: LDA Mahout
>
> Hi Jeff Eastman,
> Is there any options to perform stopwords removal while performing LDA in
> mahout or while creating sequence files from the corpus?
> Kindly reply.
>
> Thanks & Regards,
> Manoj Kumar.R.K
> Graduate Student, MS Computer Science
> University at Buffalo
> Buffalo, New York
> (413) 461-8938|www.rkmanojkumar.co.nr
>
>
>
> On Mon, Feb 28, 2011 at 1:06 PM, Manoj Kumar <manoj1987@gmail.com> wrote:
>
> > Hi Jeff Eastman,
> >
> > Thanks a lot. I ll look into it and will contact you in case of any help.
> >
> > Thanks & Regards,
> > Manoj Kumar.R.K
> > Graduate Student, MS Computer Science
> > University at Buffalo
> > Buffalo, New York
> > (413) 461-8938|www.rkmanojkumar.co.nr
> >
> >
> >
> > On Mon, Feb 28, 2011 at 12:48 PM, Jeff Eastman <jeastman@narus.com>
> wrote:
> >
> >> Look at examples/bin/build-reuters.sh for some examples. They are all
> from
> >> the command line but illustrate the best way to do what you are
> attempting.
> >>
> https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clusteringalsohas some example
code for doing text processing.
> >>
> >> -----Original Message-----
> >> From: Manoj Kumar [mailto:manoj1987@gmail.com]
> >> Sent: Monday, February 28, 2011 9:28 AM
> >> To: user@mahout.apache.org
> >> Subject: Re: LDA Mahout
> >>
> >> Hi Jeff Eastman,
> >> Thanks for your reply. I looked into the LDADriver Class. But am not
> sure
> >> as
> >> how to convert my text documents to Sequence Files and then to
> >> SparseVectors
> >> for giving input to LDADriver. Can you please help me in this
> conversion.
> >> ALso, is it enough to just call the run method in LDADriver Class with
> >> appropriate inputs for modeling the topics?
> >>
> >> Thanks & Regards,
> >> Manoj Kumar.R.K
> >> Graduate Student, MS Computer Science
> >> University at Buffalo
> >> Buffalo, New York
> >> (413) 461-8938|www.rkmanojkumar.co.nr
> >>
> >>
> >>
> >> On Mon, Feb 28, 2011 at 12:23 PM, Jeff Eastman <jeastman@narus.com>
> >> wrote:
> >>
> >> > Have you looked at the Java classes that implement LDA? The private
> >> > LDADriver.run() method should be made public, but this can be called
> >> from
> >> > Java in Eclipse (if that is what you mean by "using Eclipse"). You
> could
> >> > also look at the wiki for information on running LDA (
> >> >
> >>
> https://cwiki.apache.org/confluence/display/MAHOUT/Latent+Dirichlet+Allocation
> >> > ).
> >> >
> >> > -----Original Message-----
> >> > From: Manoj Kumar [mailto:manoj1987@gmail.com]
> >> > Sent: Monday, February 28, 2011 9:09 AM
> >> > To: user@mahout.apache.org
> >> > Subject: LDA Mahout
> >> >
> >> > Hi,
> >> >
> >> > I am doing a project which requires topic modeling of documents using
> >> LDA.
> >> > I
> >> > am planning to implement this using Mahout LDA. I am not able to get
> any
> >> > sample codes for implementing this using Eclipse. Only command line
> >> options
> >> > where available. Kindly suggest me some tutorial or please provide me
> >> some
> >> > basic code for implementing LDA. Kindly reply.
> >> >
> >> > Thanks & Regards,
> >> > Manoj Kumar.R.K
> >> > Graduate Student, MS Computer Science
> >> > University at Buffalo
> >> > Buffalo, New York
> >> > (413) 461-8938|www.rkmanojkumar.co.nr
> >> >
> >>
> >
> >
>

Mime
View raw message