mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Cooper-Ellis <>
Subject How to get document count for TFIDF calculate method?
Date Tue, 29 Jul 2014 17:02:13 GMT
Hey guys,

I'm trying to make a Bayesian classifier, but I'm having a hard time
figuring out how to programatically determine the value of the numDocs
param for calculate method in TFIDF, using the files generated building the
model on the command line.

I saw some code that did it like this:

int numDocs = documentFrequency.get(-1).intValue();

Where documentFrequency is a HashMap<Integer,Long> read from
frequency.file-0, but there's no key -1 in the file so its giving me an NPE
when I try to pass that to tfidf.calculate.

Anyone know what I'm doing wrong?



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message