nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <>
Subject [jira] Commented: (NUTCH-684) Dedup support for Solr
Date Fri, 20 Feb 2009 10:47:02 GMT


Andrzej Bialecki  commented on NUTCH-684:

IMHO it would be good to have this functionality in 1.0, and the patch is very close.

Ok, how about the following:

* we make the name of the unique field configurable, and provide a default value in nutch-default.xml,
which is consistent with the one provided in the example schema.xml (yes, we should add an
example schema, and the one in NUTCH-442 looks good enough).

* the UpdateRequest improvement: it's up to you whether to do it here or separately. It would
be certainly a nice to have.

* javadocs: yeah, map/reduce/configure are obvious, and good javadocs exist in superclasses.
Same of bean-like getters/setters. Other public methods should be documented, so that in half
a year we still know what they are for and we understand the arguments they expect.

> Dedup support for Solr
> ----------------------
>                 Key: NUTCH-684
>                 URL:
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>            Reporter: Doğacan Güney
>            Assignee: Doğacan Güney
>         Attachments: NUTCH-684_bin_nutch.patch, NUTCH-684_solrdedup_v2.patch, solrdedup.patch
> After NUTCH-442, nutch now can index to both solr and lucene. However, duplicate deletion
feature (based on digests) is only available in lucene. It should also be available for solr.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message