mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Claudia Grieco" <gri...@crmpa.unisa.it>
Subject Identify "less similar" documents
Date Wed, 13 Apr 2011 09:12:21 GMT
Hi guys,

I'm using SGD to classify a set of documents but I have a problem: there are
some documents that are not related to any of the categories and I want to
be able to identify them and exclude them from the classification. My idea
is to read the documents of the training set (that are currently in a Lucene
index) and identify the docs that have less terms in common with them. Any
idea on how to do it?

Thanks a lot

Claudia 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message