mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Rahman <drahman1...@googlemail.com>
Subject Re: text classification using mahout and lucene index
Date Fri, 14 Oct 2011 12:17:11 GMT
Ok, I discovered that I have to check, if my data contains TermFreq vectors.
That has to wait until next week, I think...

Do I have to convert the lucene index files into lucene vector files, in
order to use the data for training?

Regards,
David

2011/10/14 David Rahman <drahman1985@googlemail.com>

> Ok, thanks.
> Just to make it clear to me: I take the date with the lucene vectors and
> operate a training Alg. on them. And this should result into a model. I
> don't need some preprocessing steps or anything else?
>
> Another question: your book MiA gives a good explanation and overview about
> mahout. Can you tell me, if there is more coming about mahout+lucene? I'm
> new at this stuff, and I need some more readings.
>
> I did find "Taming Text" but from the abstract I could not determine if
> this applies to my problem.
>
> Thanks and regards,
> David
>
> take lucene vectors --> train on them with nBayes or another Alg. -->
> getting a model
>
>
> 2011/10/13 Ted Dunning <ted.dunning@gmail.com>
>
>> I just meant that there are separate components to do the different steps.
>>  Historically, some glue code was required between them, but I think that
>> the gap has been narrowed lately.
>>
>> On Thu, Oct 13, 2011 at 12:41 PM, David Rahman
>> <drahman1985@googlemail.com>wrote:
>>
>> > @Ted: Clould you explain the last part of your respond, please. That I
>> > didn't understand:
>> >
>> > >You will need to glue the lucene document vector extraction to the
>> > >naive bayes and you may want to adapt it to use feature hashing for the
>> > SGD
>> > >classifiers.
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message