lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hongxu Ma <inte...@outlook.com>
Subject Re: A question of solr recovery
Date Thu, 12 Dec 2019 10:37:39 GMT
Thank you very much @Erick Erickson<mailto:erickerickson@gmail.com>
It's very clear.

And I found my "full sync" log:
"IndexFetcher Total time taken for download (fullCopy=true,bytesDownloaded=178161685180) :
4377 secs (40704063 bytes/sec) to NIOFSDirectory@..."

A more question:
Form the log, looks it downloaded all segment files (178GB), it's very big and took a long
time.
Is it possible only download the segment file which contains the missing part? No need all
files, maybe it can save time?

For example, there is my fabricated algorithm (like database does):

  *   recovery form local tlog as much as possible
  *   calculate the latest version
  *   only download the segment file which contains data > this version

Thanks.

________________________________
From: Erick Erickson <erickerickson@gmail.com>
Sent: Wednesday, December 11, 2019 20:56
To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
Subject: Re: A question of solr recovery

Updates in this context are individual documents, either new ones
or a new version of an existing document. Long recoveries are
quite unlikely to be replaying a few documents from the tlog.

My bet is that you had to do a “full sync” (there should be messages
to that effect in the Solr log). This means that the replica had to
copy the entire index from the leader, and that varies with the size
of the index, network speed and contention, etc.

And to make it more complicated, and despite the comment about 100
docs and the tlog…. while that copy is going on, _new_ updates are
written to the tlog of the recovering replica and after the index
has been copied, those new updates are replayed locally. The 100
doc limit does _not_ apply in this case. So say the recovery starts
at time T and lasts for 60 seconds. All updates sent to the shard
leader over that 60 seconds are put in the local tlog and after the
copy is done, they’re replayed. And then, you guessed it, any
updates received by the leader over that 60 second period are written
to the recovering replica’s tlog and replayed… Under heavy
indexing loads, this can go no for quite a long time. Not certain
that’s what’s happening, but something to be aware of.

Best,
Erick

> On Dec 10, 2019, at 10:39 PM, Hongxu Ma <interma@outlook.com> wrote:
>
> Hi all
> In my cluster, Solr node turned into long time recovery sometimes.
> So I want to know more about recovery and have read a good blog:
> https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> It mentioned in the recovery section:
> "Replays the documents from its own tlog if < 100 new updates have been received by
the leader. "
>
> My question: what's the meaning of "updates"? commits? or documents?
> I refered solr code but still not sure about it.
>
> Hope you can help, thanks.
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message