mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vaibhav srivastava <>
Subject Re: How to get document count for TFIDF calculate method?
Date Tue, 29 Jul 2014 17:15:58 GMT
Hi if I am correct you want to know the number of documents by reading
frequency.file-0; You can use the SequenceFileReader to load the frequency
file and then count the number of keys that will give you the number of
Hope this helps,

On Tue, Jul 29, 2014 at 10:32 PM, Jonathan Cooper-Ellis <>

> Hey guys,
> I'm trying to make a Bayesian classifier, but I'm having a hard time
> figuring out how to programatically determine the value of the numDocs
> param for calculate method in TFIDF, using the files generated building the
> model on the command line.
> I saw some code that did it like this:
> int numDocs = documentFrequency.get(-1).intValue();
> Where documentFrequency is a HashMap<Integer,Long> read from
> frequency.file-0, but there's no key -1 in the file so its giving me an NPE
> when I try to pass that to tfidf.calculate.
> Anyone know what I'm doing wrong?
> Best,
> jce

Thanks and Regards,
Vaibhav Srivastava
Mobile no.: 9552543029

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message