lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Need help to configure automated deletion of shard in solr
Date Wed, 02 Dec 2020 12:56:12 GMT
You can certainly use the TTL logic. Note the TimeRoutedAlias, but
the DocExpirationUpdateFactory. DocExpirationUpdateFactory
operates on each document individually so you can mix-n-match
if you want.

As for knowing when a shard is empty, I suggested a method for that
in one of the earlier e-mails.

If you have a collection per customer, and assuming that a customer
has the same retention policy for all docs, then TimeRoutedAlias would
work.

Best,
Erick

> On Dec 2, 2020, at 12:19 AM, Pushkar Mishra <pushkarmnbh@gmail.com> wrote:
> 
> Hi Erick,
> It is implicit.
> TTL thing I have explored but due to some complications we can't use. that .
> Let me explain the actual use case .
> 
> We have limited space ,we can't keep storing the document for infinite
> time  . So based on the customer's retention policy ,I need to delete the
> documents. And in this process  if any shard gets empty , need to delete
> the shard as well.
> 
> So lets say , is there a way to know, when solr completes the purging of
> deleted documents, then based on that flag we can configure shard deletion
> 
> Thanks
> Pushkar
> 
> On Tue, Dec 1, 2020 at 9:02 PM Erick Erickson <erickerickson@gmail.com>
> wrote:
> 
>> This is still confusing. You haven’t told us what router you are using,
>> compositeId or implicit?
>> 
>> If you’re using compositeId (the default), you will never have empty shards
>> because docs get assigned to shards via a hashing algorithm that
>> distributes
>> them very evenly across all available shards. You cannot delete any
>> shard when using compositeId as your routing method.
>> 
>> If you don’t know which router you’re using, then you’re using compositeId.
>> 
>> NOTE: for the rest, “documents” means non-deleted documents. Solr will
>> take care of purging the deleted documents automatically.
>> 
>> I think you’re making this much more difficult than you need to. Assuming
>> that the total number of documents remains relatively constant, you can
>> just
>> let Solr take care of it all and not bother with trying to individually
>> manage
>> shards by using the default compositeID routing.
>> 
>> If the number of docs increases you might need to use splitshard. But it
>> sounds like the total number of “live” documents isn’t going to increase.
>> 
>> For TTL, if you have a _fixed_ TTL, i.e. the docs should always expire
>> after,
>> say, 30 dayswhich it doesn’t sound like you do, you can use
>> the “Time Routed Alias” option, see:
>> https://lucene.apache.org/solr/guide/7_5/time-routed-aliases.html
>> 
>> Assuming your TTL isn’t a fixed-interval, you can configure
>> DocExpirationUpdateProcessorFactory to deal with TTL automatically.
>> 
>> And if you still think you need to handle this, you need to explain exactly
>> what problem you’re trying to solve because so far it appears that
>> you’re simply taking on way more work than you need to.
>> 
>> Best,
>> Erick
>> 
>>> On Dec 1, 2020, at 9:46 AM, Pushkar Mishra <pushkarmnbh@gmail.com>
>> wrote:
>>> 
>>> Hi Team,
>>> As I explained the use case , can someone help me out to find out the
>>> configuration way to delete the shard here ?
>>> A quick response  will be greatly appreciated.
>>> 
>>> Regards
>>> Pushkar
>>> 
>>> 
>>> On Mon, Nov 30, 2020 at 11:32 PM Pushkar Mishra <pushkarmnbh@gmail.com>
>>> wrote:
>>> 
>>>> 
>>>> 
>>>> On Mon, Nov 30, 2020, 9:15 PM Pushkar Mishra <pushkarmnbh@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi Erick,
>>>>> First of all thanks for your response . I will check the possibility
 .
>>>>> Let me explain my problem  in detail :
>>>>> 
>>>>> 1. We have other use cases where we are making use of listener on
>>>>> postCommit to delete/shift/split the shards . So we have capability to
>>>>> delete the shards .
>>>>> 2. The current use case is , where we have to delete the documents from
>>>>> the shard , and during deletion process(it will be scheduled process,
>> may
>>>>> be hourly or daily, which will delete the documents) , if shards  gets
>>>>> empty (or may be lets  say nominal documents are left ) , then delete
>> the
>>>>> shard.  And I am exploring to do this using configuration .
>>>>> 
>>>> 3. Also it will not be in live shard for sure as only those documents
>> are
>>>> deleted which have TTL got over . TTL could be a month or year.
>>>> 
>>>> Please assist if you have any config based idea on this
>>>> 
>>>>> Regards
>>>>> Pushkar
>>>>> 
>>>>> On Mon, Nov 30, 2020, 8:48 PM Erick Erickson <erickerickson@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Are you using the implicit router? Otherwise you cannot delete a
>> shard.
>>>>>> And you won’t have any shards that have zero documents anyway.
>>>>>> 
>>>>>> It’d be a little convoluted, but you could use the collections
>> COLSTATUS
>>>>>> Api to
>>>>>> find the names of all your replicas. Then query _one_ replica of
each
>>>>>> shard with something like
>>>>>> solr/collection1_shard1_replica_n1/q=*:*&distrib=false
>>>>>> 
>>>>>> that’ll return the number of live docs (i.e. non-deleted docs)
and if
>>>>>> it’s zero
>>>>>> you can delete the shard.
>>>>>> 
>>>>>> But the implicit router requires you take complete control of where
>>>>>> documents
>>>>>> go, i.e. which shard they land on.
>>>>>> 
>>>>>> This really sounds like an XY problem. What’s the use  case you’re
>> trying
>>>>>> to support where you expect a shard’s number of live docs to drop
to
>>>>>> zero?
>>>>>> 
>>>>>> Best,
>>>>>> Erick
>>>>>> 
>>>>>>> On Nov 30, 2020, at 4:57 AM, Pushkar Mishra <pushkarmnbh@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi Solr team,
>>>>>>> 
>>>>>>> I am using solr cloud.(version 8.5.x). I have a need to find
out a
>>>>>>> configuration where I can delete a shard , when number of documents
>>>>>> reaches
>>>>>>> to zero in the shard , can some one help me out to achieve that
?
>>>>>>> 
>>>>>>> 
>>>>>>> It is urgent , so a quick response will be highly appreciated
.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Pushkar
>>>>>>> 
>>>>>>> --
>>>>>>> Pushkar Kumar Mishra
>>>>>>> "Reactions are always instinctive whereas responses are always
well
>>>>>> thought
>>>>>>> of... So start responding rather than reacting in life"
>>>>>> 
>>>>>> 
>>> 
>>> --
>>> Pushkar Kumar Mishra
>>> "Reactions are always instinctive whereas responses are always well
>> thought
>>> of... So start responding rather than reacting in life"
>> 
>> 
> 
> -- 
> Pushkar Kumar Mishra
> "Reactions are always instinctive whereas responses are always well thought
> of... So start responding rather than reacting in life"


Mime
View raw message