mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject n-gram and ml
Date Sat, 09 Jun 2012 15:27:48 GMT
As I understand it when using seq2sparse with ng = 2 and ml = some large 
number. This will never create a vector with less terms than words (all 
other pars of the algorithm set aside). In other words ng = 2 and ml = 
2000 will create very few n-grams but will never create a 0 length 
vector unless there are no terms to begin with.

Is this correct?

I ask because it looks like many of my n-grams are not really helpful so 
I keep tuning the ml upwards but Robin made a comment that this might 
cause 0 length vectors, in which case I might want to stop using n-grams.

View raw message