lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs)
Date Fri, 27 Mar 2015 18:37:09 GMT
You can simplify things a bit by indexing a "batch number" guaranteed
to be different between two runs for the same keyField. In fact I'd
make sure it was unique amongst all my runs. Simplest is a timestamp
(assuming you don't start two batches within a millisecond!). So it
looks like this.

get a new timestamp
Add it to _every_ doc in my current run.
issue delete-by-query like 'q=keyfield:A AND timestamp:[* TO timestamp}
commit

As Shawn says, you have to very carefully control the commits. And
also note that the curly brace at the end is NOT a typo, it excludes
the endpoint.

Best,
Erick

On Fri, Mar 27, 2015 at 7:01 AM, Russell Taylor
<Russell.Taylor@interactivedata.com> wrote:
> Yes that works and now I have a better understanding of the soft and hard commits to
boot.
>
> Thanks again Shawn.
>
>
> Russ.
>
> -----Original Message-----
> From: Shawn Heisey [mailto:apache@elyograg.org]
> Sent: 27 March 2015 13:22
> To: solr-user@lucene.apache.org
> Subject: Re: Replacing a group of documents (Delete/Insert) without a query on the index
ever showing an empty list (Docs)
>
> On 3/27/2015 7:07 AM, Russell Taylor wrote:
>> Hi Shawn, thanks for the quick reply.
>>
>> I've looked at both methods and I think that they won't work for a number of reasons:
>>
>> 1)
>> uniqueKey:
>>  I could use the uniqueKey and overwrite the original document but I
>> need to remove the documents which are not on my new input list and the issue with
the uniqueKey method is I don't know what to delete.
>>
>> Documents on the index:
>> "docs": [
>> {
>> "id":"1"
>> "keyField":"A"
>> },{
>> "id":"2"
>> "keyField":"A"
>> },{
>> "id":"3"
>> "keyField":"B"
>> }
>> ]
>> New Documents to go on index
>> "docs": [
>> {
>> "id":"1"
>> "keyField":"A"
>> },{
>> "id":"3"
>> "keyField":"B"
>> }
>> ]
>> I would never know that id:2 should be deleted. (on some new document lists the delete
list could be in the millions).
>>
>> 2)
>> openSearcher:
>> My openSearcher is set to false and I've also commented out autoSoftCommit so I don't
get a partial list being returned on a query.
>> <!--
>> <autoSoftCommit>
>>        <maxTime>${solr.autoSoftCommit.maxTime:1000}</maxTime>
>> </autoSoftCommit>
>> -->
>>
>>
>> So is there another way to keep the original set of documents until the new set has
been added to the index?
>
> If you are 100% in control of when commits with openSearcher=true are sent, which it
sounds like you probably are, then you can do anything you want from the start of indexing
until commit time, and the user will never see any of it, until the commit happens.  That
allows the following relatively simple paradigm:
>
> 1) Delete LOTS of stuff, or perhaps everything in the index with a deleteByQuery of *:*
(for all documents).
>
> 2) Index everything you need to index.
>
> 3) Commit.
>
> Thanks,
> Shawn
>
>
>
> *******************************************************
> This message (including any files transmitted with it) may contain confidential and/or
proprietary information, is the property of Interactive Data Corporation and/or its subsidiaries,
and is directed only to the addressee(s). If you are not the designated recipient or have
reason to believe you received this message in error, please delete this message from your
system and notify the sender immediately. An unintended recipient's disclosure, copying, distribution,
or use of this message or any attachments is prohibited and may be unlawful.
> *******************************************************

Mime
View raw message