mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Byrne <dby...@mdb.com>
Subject TFIDFPartialVectorReducer minDf
Date Thu, 20 Sep 2012 17:55:31 GMT
In TFIDFPartialVectorReducer.java:

If docFreq > maxDocFreq then the vector at that index is not set (ignored)
If docFreq < minDocFreq then the vector at that index is set to the TfIdf calculation using
minDocFreq instead of the actual document frequency.

Should minDocFreq not be treated the same as maxDocFreq by skipping setting the vector at
that index?

In both cases, the vector length remains the same and these settings have no effect on pruning
the vector length / term reduction?


NOTICE: This message and any attachments are intended only for the use of the addressee and
may contain confidential, proprietary and/or privileged information. If you are not the intended
recipient, any review, use, distribution, dissemination or copying of this email is prohibited.
If you have received this email in error, please notify the sender by replying to this message
and delete this email immediately. Securities trading, account management, and investment
banking services are offered by MDB Capital Group LLC, a registered broker-dealer and member
of FINRA and SIPC. Unless clearly stated, nothing herein shall be construed to be an offer
to sell, nor a solicitation of an offer to buy, any financial product.

Mime
View raw message