mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlyubi...@apache.org>
Subject Re: LDA Mahout
Date Tue, 01 Mar 2011 22:55:08 GMT
There's a way to specify custom lucene analyzer with one of the jobs,
i think it is seq2sparse. there's an option for that. Naturally, if
you use your own analyzer, you might write it with your custom stop
word list (or perhaps there's an option to do that with StopAnalyzer
from lucene, or what's its name.)

-d

On Tue, Mar 1, 2011 at 2:42 PM, Jeff Eastman <jeastman@narus.com> wrote:
> Not with seq2sparse. We do have some Lucene support which may allow this. Grant?
>
> -----Original Message-----
> From: Manoj Kumar [mailto:manoj1987@gmail.com]
> Sent: Tuesday, March 01, 2011 12:26 PM
> To: user@mahout.apache.org
> Subject: Re: LDA Mahout
>
> thanks. But is it possible to provide customized stop words list being
> loaded from a text file?
>
> Thanks & Regards,
> Manoj Kumar.R.K
> Graduate Student, MS Computer Science
> University at Buffalo
> Buffalo, New York
> (413) 461-8938|www.rkmanojkumar.co.nr
>
>
>
> On Tue, Mar 1, 2011 at 11:36 AM, Jeff Eastman <jeastman@narus.com> wrote:
>
>> Sure, seq2sparse has -maxDFPercent option which can be used to eliminate
>> high frequency features like stop words. Check out the documentation at
>> https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text
>> .
>>
>> -----Original Message-----
>> From: Manoj Kumar [mailto:manoj1987@gmail.com]
>> Sent: Monday, February 28, 2011 10:51 PM
>> To: user@mahout.apache.org
>> Subject: Re: LDA Mahout
>>
>> Hi Jeff Eastman,
>> Is there any options to perform stopwords removal while performing LDA in
>> mahout or while creating sequence files from the corpus?
>> Kindly reply.
>>
>> Thanks & Regards,
>> Manoj Kumar.R.K
>> Graduate Student, MS Computer Science
>> University at Buffalo
>> Buffalo, New York
>> (413) 461-8938|www.rkmanojkumar.co.nr
>>
>>
>>
>> On Mon, Feb 28, 2011 at 1:06 PM, Manoj Kumar <manoj1987@gmail.com> wrote:
>>
>> > Hi Jeff Eastman,
>> >
>> > Thanks a lot. I ll look into it and will contact you in case of any help.
>> >
>> > Thanks & Regards,
>> > Manoj Kumar.R.K
>> > Graduate Student, MS Computer Science
>> > University at Buffalo
>> > Buffalo, New York
>> > (413) 461-8938|www.rkmanojkumar.co.nr
>> >
>> >
>> >
>> > On Mon, Feb 28, 2011 at 12:48 PM, Jeff Eastman <jeastman@narus.com>
>> wrote:
>> >
>> >> Look at examples/bin/build-reuters.sh for some examples. They are all
>> from
>> >> the command line but illustrate the best way to do what you are
>> attempting.
>> >>
>> https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clusteringalsohas some
example code for doing text processing.
>> >>
>> >> -----Original Message-----
>> >> From: Manoj Kumar [mailto:manoj1987@gmail.com]
>> >> Sent: Monday, February 28, 2011 9:28 AM
>> >> To: user@mahout.apache.org
>> >> Subject: Re: LDA Mahout
>> >>
>> >> Hi Jeff Eastman,
>> >> Thanks for your reply. I looked into the LDADriver Class. But am not
>> sure
>> >> as
>> >> how to convert my text documents to Sequence Files and then to
>> >> SparseVectors
>> >> for giving input to LDADriver. Can you please help me in this
>> conversion.
>> >> ALso, is it enough to just call the run method in LDADriver Class with
>> >> appropriate inputs for modeling the topics?
>> >>
>> >> Thanks & Regards,
>> >> Manoj Kumar.R.K
>> >> Graduate Student, MS Computer Science
>> >> University at Buffalo
>> >> Buffalo, New York
>> >> (413) 461-8938|www.rkmanojkumar.co.nr
>> >>
>> >>
>> >>
>> >> On Mon, Feb 28, 2011 at 12:23 PM, Jeff Eastman <jeastman@narus.com>
>> >> wrote:
>> >>
>> >> > Have you looked at the Java classes that implement LDA? The private
>> >> > LDADriver.run() method should be made public, but this can be called
>> >> from
>> >> > Java in Eclipse (if that is what you mean by "using Eclipse"). You
>> could
>> >> > also look at the wiki for information on running LDA (
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/MAHOUT/Latent+Dirichlet+Allocation
>> >> > ).
>> >> >
>> >> > -----Original Message-----
>> >> > From: Manoj Kumar [mailto:manoj1987@gmail.com]
>> >> > Sent: Monday, February 28, 2011 9:09 AM
>> >> > To: user@mahout.apache.org
>> >> > Subject: LDA Mahout
>> >> >
>> >> > Hi,
>> >> >
>> >> > I am doing a project which requires topic modeling of documents using
>> >> LDA.
>> >> > I
>> >> > am planning to implement this using Mahout LDA. I am not able to get
>> any
>> >> > sample codes for implementing this using Eclipse. Only command line
>> >> options
>> >> > where available. Kindly suggest me some tutorial or please provide
me
>> >> some
>> >> > basic code for implementing LDA. Kindly reply.
>> >> >
>> >> > Thanks & Regards,
>> >> > Manoj Kumar.R.K
>> >> > Graduate Student, MS Computer Science
>> >> > University at Buffalo
>> >> > Buffalo, New York
>> >> > (413) 461-8938|www.rkmanojkumar.co.nr
>> >> >
>> >>
>> >
>> >
>>
>

Mime
View raw message