lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Owen (JIRA)" <>
Subject [jira] [Updated] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs
Date Wed, 04 Jan 2017 17:58:58 GMT


Tim Owen updated SOLR-9918:
    Attachment: SOLR-9918.patch

> An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs
> --------------------------------------------------------------------------------------
>                 Key: SOLR-9918
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: update
>            Reporter: Tim Owen
>         Attachments: SOLR-9918.patch, SOLR-9918.patch
> This is an UpdateRequestProcessor and Factory that we have been using in production,
to handle 2 common cases that were awkward to achieve using the existing update pipeline and
current processor classes:
> * When inserting document(s), if some already exist then quietly skip the new document
inserts - do not churn the index by replacing the existing documents and do not throw a noisy
exception that breaks the batch of inserts. By analogy with SQL, {{insert if not exists}}.
In our use-case, multiple application instances can (rarely) process the same input so it's
easier for us to de-dupe these at Solr insert time than to funnel them into a global ordered
queue first.
> * When applying AtomicUpdate documents, if a document being updated does not exist, quietly
do nothing - do not create a new partially-populated document and do not throw a noisy exception
about missing required fields. By analogy with SQL, {{update where id = ..}}. Our use-case
relies on this because we apply updates optimistically and have best-effort knowledge about
what documents will exist, so it's easiest to skip the updates (in the same way a Database
> I would have kept this in our own package hierarchy but it relies on some package-scoped
methods, and seems like it could be useful to others if they choose to configure it. Some
bits of the code were borrowed from {{DocBasedVersionConstraintsProcessorFactory}}.
> Attached patch has unit tests to confirm the behaviour.
> This class can be used by configuring solrconfig.xml like so..
> {noformat}
>   <updateRequestProcessorChain name="skipexisting">
>     <processor class="solr.LogUpdateProcessorFactory" />
>     <processor class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory">
>       <bool name="skipInsertIfExists">true</bool>
>       <bool name="skipUpdateIfMissing">false</bool> <!-- We will override
this per-request -->
>     </processor>
>     <processor class="solr.DistributedUpdateProcessorFactory" />
>     <processor class="solr.RunUpdateProcessorFactory" />
>   </updateRequestProcessorChain>
> {noformat}
> and initParams defaults of
> {noformat}
>       <str name="update.chain">skipexisting</str>
> {noformat}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message