lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Sekiguchi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs
Date Wed, 04 Jan 2017 02:26:58 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796928#comment-15796928
] 

Koji Sekiguchi commented on SOLR-9918:
--------------------------------------

I believe the proposal is very useful for users who need this function, but it is better for
users if there is an additional explanation of the difference from the existing one that gives
similar function.

How do users decide which UpdateRequestProcessor to use for their use cases as compared to
SignatureUpdateProcessor?

> An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs
> --------------------------------------------------------------------------------------
>
>                 Key: SOLR-9918
>                 URL: https://issues.apache.org/jira/browse/SOLR-9918
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: update
>            Reporter: Tim Owen
>         Attachments: SOLR-9918.patch
>
>
> This is an UpdateRequestProcessor and Factory that we have been using in production,
to handle 2 common cases that were awkward to achieve using the existing update pipeline and
current processor classes:
> * When inserting document(s), if some already exist then quietly skip the new document
inserts - do not churn the index by replacing the existing documents and do not throw a noisy
exception that breaks the batch of inserts. By analogy with SQL, {{insert if not exists}}.
In our use-case, multiple application instances can (rarely) process the same input so it's
easier for us to de-dupe these at Solr insert time than to funnel them into a global ordered
queue first.
> * When applying AtomicUpdate documents, if a document being updated does not exist, quietly
do nothing - do not create a new partially-populated document and do not throw a noisy exception
about missing required fields. By analogy with SQL, {{update where id = ..}}. Our use-case
relies on this because we apply updates optimistically and have best-effort knowledge about
what documents will exist, so it's easiest to skip the updates (in the same way a Database
would).
> I would have kept this in our own package hierarchy but it relies on some package-scoped
methods, and seems like it could be useful to others if they choose to configure it. Some
bits of the code were borrowed from {{DocBasedVersionConstraintsProcessorFactory}}.
> Attached patch has unit tests to confirm the behaviour.
> This class can be used by configuring solrconfig.xml like so..
> {noformat}
>   <updateRequestProcessorChain name="skipexisting">
>     <processor class="solr.LogUpdateProcessorFactory" />
>     <processor class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory">
>       <bool name="skipInsertIfExists">true</bool>
>       <bool name="skipUpdateIfMissing">false</bool> <!-- We will override
this per-request -->
>     </processor>
>     <processor class="solr.DistributedUpdateProcessorFactory" />
>     <processor class="solr.RunUpdateProcessorFactory" />
>   </updateRequestProcessorChain>
> {noformat}
> and initParams defaults of
> {noformat}
>       <str name="update.chain">skipexisting</str>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message