nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Duplicate Detection: Offlince vs. Search Time
Date Mon, 17 Apr 2006 16:52:49 GMT
Shailesh Kochhar wrote:
> If I understand this correctly, you can only dedup by one field. This 
> would mean that if you were to implement and use content-based 
> deduplication, you'd have to give up limiting the number of hits per host.
> Is this correct, or did I miss something?

That's correct.  That's what's currently implemented.


View raw message