mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Bonerz <jbon...@googlemail.com>
Subject Re: Naive bayes and character n-grams
Date Wed, 09 Oct 2013 13:00:04 GMT
Hi Dean,

i might be wrong. but try googling for "shingling"... could be something to
start with.

Cheers
Jens


2013/10/9 Ted Dunning <ted.dunning@gmail.com>

> Yes.  Should work to use character n-grams.  There are oddities in the
> stats because the different n-grams are not independent, but Naive Bayes
> methods are in such a state of sin that it shouldn't hurt any worse.
>
> No... I don't think that there is a capability built in to generate the
> character n-grams.  Should be relatively trivial to build.
>
>
>
> On Wed, Oct 9, 2013 at 3:18 AM, Dean Jones <dean.m.jones@gmail.com> wrote:
>
> > Hello folks,
> >
> > I see that it's possible to use mahout to train a naive bayes
> > classifier using n-grams as features (or I guess, strictly speaking,
> > mahout can be used to generate sequence files containing n-grams; I
> > suspect the naive bayes trainer is indifferent to the form of features
> > it trains on). Is there any facility to generate character n-grams
> > instead of word n-grams?
> >
> > Thanks,
> >
> > Dean.
> >
>

<http://www.hightechmg.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message