cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Tunnicliffe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-14865) Cascading calls to read retries, system_auth, and read repairs
Date Fri, 02 Nov 2018 18:29:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673519#comment-16673519
] 

Sam Tunnicliffe commented on CASSANDRA-14865:
---------------------------------------------

{quote}We are discarding blocking read repairs possibility because we consistently get the
same results when doing the query several times
{quote}
The read-repair messages here are slightly misleading. These are reported when a response
is passed to the read callback after enough to satisfy the request's consistency level have
already been received. In this case, the reads will be being done at LOCAL_ONE, so whenever
the second response is received it goes down this path. This triggers a comparison between
all the received responses, but they don't appear to be mismatching as that would also be
recorded in the trace. So no actual repair is happening, at least in these traces.
{quote}I have seen AlwaysSpeculatingReadExecutor mentioned
{quote}
AlwaysSpeculatingExecutor is not involved here as if it were the trace would not include the
"speculating read retry..." entries. Rather, the speculative retry policies for the roles
& role_permissions tables are set to the 99th percentile (as noted on CASSANDRA-11340
this is set by default and cannot be altered). This is causing the SpeculatingReadExecutor
to kick in and send an additional request.
{quote}We would expect calls to roles, within sequential read sessions, because the cache
could be turned, but not so many calls within the same tracing session.
{quote}
The multiple reads in a single request are probably explained by two factors:
 * The roles cache in 3.x is pretty naive in that it only caches role membership info. The
first step of the authorization process (with CassandraAuthorizer) is to check the superuser
status, which unfortunately is not cached but has to be read from the roles table. So any
authorization request that can't be satisfied from the permissions cache is going to trigger
a read from system_auth.roles. This is fixed in trunk by CASSANDRA-14497.
 * Permissions and the resources they relate to are defined hierarchically, e.g. keyspace
-> table. When performing authorization, the chain of resources is traversed from the bottom
up until either the required permission is found, or the top level is reached. So if a role
has permissions granted at the keyspace level (e.g. GRANT SELECT ON KEYSPACE ks TO bob), then
a read against a ks.table1 will first check permissions granted directly on the table, then
on the keyspace. The caching is aligned with the requested permissions, rather than what is
directly granted. So in this example, the permissions would be cached for that role/table
combination, not the role/keyspace, so this chain traversal only happens when a cache entry
is first loaded.

I would advise you to first bump the validity period (the default of 2000ms is pretty low).
Also, specifying a separate update_interval will improve the performance of the cache compared
to the out of the box setup. When an entry is older than the validity period, it is expired
from the cache, which forces a synchronous reload from disk the next time it is queried. The
update_interval, sets a threshold for when a cache entry is eligible for an async refresh.
While this refresh is happening, the previous value will be returned and so is transparent
to callers. Setting an update_interval < validity allows unread entries to eventually get
evicted, but doesn't penalise reads on infrequently accessed entries.

> Cascading calls to read retries, system_auth, and read repairs
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-14865
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14865
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Pedro Gordo
>            Priority: Major
>             Fix For: 3.11.1
>
>         Attachments: cn_dc.txt, nec_dc.txt, wec_dc.txt
>
>
> Roles validity and permission cache values are the default ones. Same thing for the read-repair
chance.
> We have a cluster with 3 data centers. We have noticed that in 2 of the data centers
(NEC and CN) we have multiple calls to speculative read retries (rapid read protection), roles
(instead of using cached values within the same tracing session), and multiple read repair
messages.
> We would expect calls to roles, within sequential read sessions, because the cache could
be turned, but not so many calls within the same tracing session. Same thing for read-retries,
and read repair messages. We are discarding blocking read repairs possibility because we consistently
get the same results when doing the query several times.
> It feels like something is cascading calls to these mechanisms regardless of conditions
that would prevent them from being called (cached roles values for instance).
> I have attached tracing files from the 3 data centers. Please let me know if more info
is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message