nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@nutch.org>
Subject Re: svn commit: r359822 - in /lucene/nutch/trunk: bin/ conf/ src/java/org/apache/nutch/crawl/ src/java/org/apache/nutch/fetcher/ src/java/org/apache/nutch/indexer/ src/java/org/apache/nutch/parse/ src/java/org/apache/nutch/segment/ src/java/org/apache/nutc...
Date Mon, 02 Jan 2006 18:39:30 GMT
ab@apache.org wrote:
> Now users can select their own page signature implementation, possibly
> with better properties than the old one.
> 
> Two implementations are provided:
> 
> * MD5Signature: backward-compatible with the old schema.
> 
> * TextProfileSignature: an example implementation of a signature, which
>   gives the same values for near-duplicate pages. Please see Javadoc for
>   more information.

This looks great!  Thanks!

Shouldn't this also be used in DeleteDuplicates.java?

Doug

Mime
View raw message