jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikas Saurabh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-2106) Optimize reads from secondaries
Date Thu, 09 Jul 2015 14:24:04 GMT

    [ https://issues.apache.org/jira/browse/OAK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614396#comment-14614396
] 

Vikas Saurabh edited comment on OAK-2106 at 7/9/15 2:23 PM:
------------------------------------------------------------

I'm working on a patch for this issue. There was a bit of discussion around this here \[0]
which implied that we'd poll to read replica set status.
Let me try to explain what I am planning to do for calculating safeReplicatedTime (defined
as: each member in the replica set is known to have data up to this time):
# Poll PRIMARY for replica set status
# Iterate optime for each member with SECONDARY status and get minimum timestamp
# If any of the member has and error status (6, >=8) \[0], then break out of the loop and
mark that it’s unsafe to reach out to a secondary
#* This probably requires a bit of explanation – since we are polling, we might calculate
a safeTime (T1) at which time a replica (S1) was DOWN. Now, by the time we poll next, S1 can
come back up and start syncing itself from PRIMARY. BUT, it’s applied optime might remain
less than T1. So, it would break the premise that all replicas have updates till T1
This part keeps happening asynchronously.
#* Assumtion: If current arbiter (state=7) joins replica set as secondary, then by the time
it reaches SECONDARY status, it'd have data at least up to this point.
 
Now, the logic to read remains fairly same as it happens today
# Get parent doc from cache (this doc is guaranteed to be up-to-date till last backgrounRead)
There are 2 cases when parent is available in cache
#* Parent was last updated by some other node after background read – since, our visibility
of other nodes is pinned by root._lastRev read during background read, so this cache state
is also valid for any revision that we’d need to use
#* Parent was last updated by me (same node) after background read – -this is fairly safe
as the doc in cache would also have got updated-. After discussing with @mreutegg, it seems
this is the tricker of these 2 cases. But, since the changes are from local node, we should
be able to come up with some strategy to work around this. Would update this issue with further
thoughts.
# If cached parent doc has not been modified after safeReplicatedTime, then it should be safe
to reach out to nearest replica to pull the child document.
 
Assumptions:
* I’m assuming though that PRIMARY of a replica set is calculated by majoriy voting. So,
PRIMARY-ness of a replica is a universal state despite any topology parititioning.
 
I’m sure there’d be some gaps/holes here. So, let’s try to break it :).

(cc [~mreutegg], [~chetanm], [~nleite])

\[0]: http://markmail.org/thread/5zjccjrg4fwz32qb
\[1]: http://docs.mongodb.org/manual/reference/replica-states/


was (Author: catholicon):
I'm working on a patch for this issue. There was a bit of discussion around this here \[0]
which implied that we'd poll to read replica set status.
Let me try to explain what I am planning to do for calculating safeReplicatedTime (defined
as: each member in the replica set is known to have data up to this time):
# Poll PRIMARY for replica set status
# Iterate optime for each member with SECONDARY status and get minimum timestamp
# If any of the member has and error status (6, >=8) \[0], then break out of the loop and
mark that it’s unsafe to reach out to a secondary
#* This probably requires a bit of explanation – since we are polling, we might calculate
a safeTime (T1) at which time a replica (S1) was DOWN. Now, by the time we poll next, S1 can
come back up and start syncing itself from PRIMARY. BUT, it’s applied optime might remain
less than T1. So, it would break the premise that all replicas have updates till T1
This part keeps happening asynchronously.
#* Assumtion: If current arbiter (state=7) joins replica set as secondary, then by the time
it reaches SECONDARY status, it'd have data at least up to this point.
 
Now, the logic to read remains fairly same as it happens today
# Get parent doc from cache (this doc is guaranteed to be up-to-date till last backgrounRead)
There are 2 cases when parent is available in cache
#* Parent was last updated by me (same node) after background read – this is fairly safe
as the doc in cache would also have got updated
#* Parent was last updated by some other node after background read – since, our visibility
of other nodes is pinned by root._lastRev read during background read, so this cache state
is also valid for any revision that we’d need to use
# If cached parent doc has not been modified after safeReplicatedTime, then it should be safe
to reach out to nearest replica to pull the child document.
 
Assumptions:
* I’m assuming though that PRIMARY of a replica set is calculated by majoriy voting. So,
PRIMARY-ness of a replica is a universal state despite any topology parititioning.
* Also, I’m assuming that _modified of parent (complete hierarchy) is updated in sync with
any change in document – but there’s a comment in current code:
{code}
// FIXME: this is not quite accurate, because ancestors
// are updated in a background thread (_lastRev). We
// will need to revise this for low maxReplicationLagMillis
// values
{code}
I’m not completely sure of that flow – but, in that case, we can then calculate {{reallySafeReplicatedTime
= min(lastBackgroundReadTime, safeReplicatedTime)}} before step(2) of document reading logic.
BUT, I’m not sure of how to make lastBackgroundReadTime available to MongoDocStore.
 
I’m sure there’d be some gaps/holes here. So, let’s try to break it :).

(cc [~mreutegg], [~chetanm], [~nleite])

\[0]: http://markmail.org/thread/5zjccjrg4fwz32qb
\[1]: http://docs.mongodb.org/manual/reference/replica-states/

> Optimize reads from secondaries
> -------------------------------
>
>                 Key: OAK-2106
>                 URL: https://issues.apache.org/jira/browse/OAK-2106
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core, mongomk
>            Reporter: Marcel Reutegger
>            Assignee: Marcel Reutegger
>              Labels: performance, scalability
>             Fix For: 1.3.5
>
>
> OAK-1645 introduced support for reads from secondaries under certain
> conditions. The current implementation checks the _lastRev on a potentially
> cached parent document and reads from a secondary if it has not been
> modified in the last 6 hours. This timespan is somewhat arbitrary but
> reflects the assumption that the replication lag of a secondary shouldn't
> be more than 6 hours.
> This logic should be optimized to take the actual replication lag into
> account. MongoDB provides information about the replication lag with
> the command rs.status().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message