lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S G <sg.online.em...@gmail.com>
Subject Re: New replica types
Date Wed, 03 Jan 2018 06:09:47 GMT
AFAIK, tlog file is truncated with a hard-commit.
So if the TLOG replica is only pulling the tlog-file, it would become out
of date if it does not pull the full index too.
That means that the TLOG replica would do a full copy every time there is a
commit on the leader.

PULL replica, by definition copies index files only and so it would do full
recoveries often too.


How intelligent are the two replica types in determining that they need to
do a full recovery vs partial recovery?
Does full recovery happen every hard-commit on the leader?
Or does it happen with segment merges on the leader? (because index files
will look much different after a segment-merge)



NRT replicas will typically have very different files in their on-disk
> indexes even though they contain the same documents.


This is something which has caused full recoveries many times in my
clusters and I wish there was a solution to this one.
Do you think it would make sense for all replicas of a shard to agree upon
the segment where a document should go to?
Coupling this with an agreed cadence on segment merges, Solr would never do
full recovery. (It's a very high level view of course and will need lot of
refinements if implemented).

Getting a cadence on segment merges could possibly be implemented by a
time-based merging strategy where documents arriving within a particular
time-range only will form a particular segment.
So documents arriving between 1pm-2pm go to segment 1, those between
2pm-3pm go to segment 2 and so on.
That ways replicas will only copy the last N segments (with N being 1
generally) instead of doing a full recovery.
Even if merging happens on the leader, the last N segments should not be
cleared to avoid full recoveries on the replicas.
(I know something like this happens today, but not very sure about the
internal details and it's nowhere documented clearly).

Currently, I see my replicas go into full-recovery even when I dynamically
add a field to a collection or a replica missed updates for a few seconds.
(I do have high values for catchup rather than the default 100)


Thanks
SG






On Tue, Jan 2, 2018 at 8:58 PM, Shawn Heisey <apache@elyograg.org> wrote:

> On 1/2/2018 8:02 PM, S G wrote:
>
>> If the above is incorrect, can someone please point that out?
>>
>
> Assuming I have a correct understanding of how the different replica types
> work, I have some small clarifications.  If my understanding is incorrect,
> I hope somebody will point out my errors.
>
> TLOG is leader eligible because it keeps transaction logs from ongoing
> indexing, although it does not perform that indexing on its own index
> unless it becomes the leader.  Transaction logs are necessary for operation
> as leader.
>
> PULL does not keep transaction logs, which is why it is not leader
> eligible.  It only copies the index data.
>
> Either TLOG or PULL would do a full index copy if the local index is
> suddenly very different from the leader.  This could happen in situations
> where you have NRT replicas and the leader changes -- NRT replicas will
> typically have very different files in their on-disk indexes even though
> they contain the same documents.  When the leader changes to a different
> NRT replica, TLOG/PULL replicas will suddenly find that they have a very
> different list of index files, so they will fetch the entire index.
>
> Thanks,
> Shawn
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message