mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Croley" <>
Subject Classification with data from Lucene
Date Tue, 05 Apr 2011 01:51:38 GMT
I have a large Lucene index (with TermFreq vectors). I do not have easy
access to the original source docs that the index was made from. I have
identified a set of docs in the index as Category X. Is there a way to
run Mahout's Bayesian classification algorithm, trained on the docs in
Category X, on the remaining docs in the index to better indentify
category matches?


I have also exported the Lucene data into a Vector file in prep to run
some clustering experiments (as per the wiki examples) and also wondered
if that data could be used to feed the CBayes code. From what I can
tell, the classification code in Mahout takes a completely different
form of input compared to the clustering algorithms.


Thanks for any pointers.



David Croley

Lead Engineer


512.351.0198 BlackBerry

512.276.5518 Desk <> 


Global in reach. Local in focus.


Confidentiality Notice: This electronic communication contained in this e-mail from
(including any attachments) may contain privileged and/or confidential information. This communication
is intended only for the use of indicated e-mail addressees. Please be advised that any disclosure,
dissemination, distribution, copying, or other use of this communication or any attached document
other than for the purpose intended by the sender is strictly prohibited. If you have received
this communication in error, please notify the sender immediately by reply e-mail and promptly
destroy all electronic and printed copies of this communication and any attached document.
Thank you in advance for your cooperation.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message