mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <simon.2.thomp...@bt.com>
Subject RE: Naive bayes and character n-grams
Date Thu, 10 Oct 2013 06:59:39 GMT
Hey Dean,

what do you mean by character n-grams? If you mean things like "&ab" or "ui2" then given
that there are so few characters compared to words is there a problem that can't be solved
without a look-up table for n<y (where y <4ish )

Or are you looking at y >4 ish because if so then do you run into the issue of a sudden
space explosion?

Best

Simon
----
Dr. Simon Thompson

________________________________________
From: Dean Jones [dean.m.jones@gmail.com]
Sent: 09 October 2013 11:18
To: user@mahout.apache.org
Subject: Naive bayes and character n-grams

Hello folks,

I see that it's possible to use mahout to train a naive bayes
classifier using n-grams as features (or I guess, strictly speaking,
mahout can be used to generate sequence files containing n-grams; I
suspect the naive bayes trainer is indifferent to the form of features
it trains on). Is there any facility to generate character n-grams
instead of word n-grams?

Thanks,

Dean.

Mime
View raw message