lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: solr ignore duplicate documents
Date Tue, 13 Dec 2011 21:15:24 GMT
You're probably talking a custom update handler here. That
way you can do a document ID lookup, that is just see if the
incoming document ID is in the index already and throw
the document away if you find one. This should be very
efficient, much more efficient than making a separate query
for each one.

There's no way that I know of to do this out of the box in Solr though.

Best
Erick

On Tue, Dec 13, 2011 at 3:44 PM, Mikhail Khludnev
<mkhludnev@griddynamics.com> wrote:
> Man,
>
> Does overwrite=false work for you?
>  http://wiki.apache.org/solr/UpdateXmlMessages#add.2BAC8-replace_documents
>
> Regards
>
> On Tue, Dec 13, 2011 at 11:34 PM, Alexander Aristov <
> alexander.aristov@gmail.com> wrote:
>
>> People,
>>
>> I am asking for your help with solr.
>>
>> When a document is sent to solr and such document already exists in its
>> index (by its ID) then the new doc replaces the old one.
>>
>> But I don't want to automatically replace documents. Just ignore and
>> proceed to the next. How can I configure solr to do so?
>>
>> Of course I can query solr to check if it has the document already but it's
>> bad for me since I do bulk updates and this will complicate the process and
>> increase amount of request.
>>
>> So are there any ways to configure solr to ignore duplicates? Just ignore.
>> I don't need any specific responses or actions.
>>
>> Best Regards
>> Alexander Aristov
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Developer
> Grid Dynamics
> tel. 1-415-738-8644
> Skype: mkhludnev
> <http://www.griddynamics.com>
>  <mkhludnev@griddynamics.com>

Mime
View raw message