mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mani Kumar <manikumarchau...@gmail.com>
Subject Re: Incremental training of Classifier
Date Tue, 29 Dec 2009 05:15:59 GMT
@Robin: thanks! btw whats the reasoning behind using CBayes for >2
categories? While bayes works for spam/not spam kinda classification, why
not for > 2 categories. It'd great if you can give some pointers to read and
understand.

@Ted: Currently i just started experimentation with mahout, and don't have a
very clear picture of how it can work for us. I'll let you details as i get
more experience with mahout and more deeper understanding of our
requirement.

Thanks!
Mani Kumar

On Tue, Dec 29, 2009 at 6:14 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> mani,
>
> You are sounding more and more like the poster child for an on-line
> classifier.
>
> The idea would be that you would give your classified docs to the system
> first for testing, then again for incremental training.  You can use the
> results of the test to adjust the learning rate for the incremental
> learning.
>
> See the work I have started with MAHOUT-228 for the beginnings of this.
>  Let
> me know where it should go to help with your needs (i.e. what entry points
> that you would need).
>
> On Mon, Dec 28, 2009 at 1:33 PM, Mani Kumar <manikumarchauhan@gmail.com
> >wrote:
>
> > lets talk about bigger numbers e.g. i have more than 1 million docs and i
> > get 10k new docs every day out of which 6k is already classified.
> >
> > Monitoring performance is good but it can be done weekly instead of daily
> > just to reduce cost.
> >
> > I actually wanted to avoid the retraining as much as possible because it
> > comes with huge cost for large dataset.
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message