mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Document Comparison with Mahout
Date Wed, 07 Jul 2010 18:19:58 GMT
How do you want to determine copy?  Strictly or loosely?  Solr and Nutch have some deduplication
capabilities, including fuzzy matching.  They probably could be brought into Mahout, too.

-Grant

On Jul 7, 2010, at 10:23 AM, JAGANADH G wrote:

> Dear All
> 
> Is there any way or algo available to compare tow documents.
> Eg. Check if doc "A" is a copy (palagirised version) of document "B".
> 
> With regards
> 
> -- 
> **********************************
> JAGANADH G
> http://jaganadhg.freeflux.net/blog


Mime
View raw message