nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: svn commit: r359822 - in /lucene/nutch/trunk: bin/ conf/ src/java/org/apache/nutch/crawl/ src/java/org/apache/nutch/fetcher/ src/java/org/apache/nutch/indexer/ src/java/org/apache/nutch/parse/ src/java/org/apache/nutch/segment/ src/java/org/apache/nutc...
Date Mon, 02 Jan 2006 20:19:42 GMT
Doug Cutting wrote:

> ab@apache.org wrote:
>
>> Now users can select their own page signature implementation, possibly
>> with better properties than the old one.
>>
>> Two implementations are provided:
>>
>> * MD5Signature: backward-compatible with the old schema.
>>
>> * TextProfileSignature: an example implementation of a signature, which
>>   gives the same values for near-duplicate pages. Please see Javadoc for
>>   more information.
>
>
> This looks great!  Thanks!
>
> Shouldn't this also be used in DeleteDuplicates.java?


Yes, I missed that. No harm done (yet), because the two existing 
implementations both produce an MD5 digest, just differently. I'll fix it.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Mime
View raw message