cassandra-pr mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [cassandra] dvohra commented on issue #419: Hints
Date Mon, 20 Jan 2020 05:23:55 GMT
dvohra commented on issue #419: Hints
URL: https://github.com/apache/cassandra/pull/419#issuecomment-576111592
 
 
    
   Joey,
   Thanks for reorganizing the Hints page with some additions. 
   The editor Google Docs and drawing tool Google Drawing were suggested or approved by project
managers Nate and Dinesh (cced), and Google tools would be most appropriate as the project
is sponsored by Google. 
   I shall make slight edits as suggested and the pull request has to be merged by someone
else than myself.
   regards,Deepak    On Monday, January 20, 2020, 02:42:30 a.m. UTC, Joseph Lynch <notifications@github.com>
wrote:  
    
    
   @jolynch requested changes on this pull request.
   
   Overall this is a great start.
   
   I've left some comments and started a branch based off yours in dvohra/cassandra@hints...jolynch:hints
with my suggestions. Feel free to pull them in or not.
   
   General comments
      
      - I'd recommend using https://www.mathcha.io/editor to make your diagrams instead of
docs. It is free as well, and in my opinion easier and looks more professional (and it can
export svgs). Also prefer svgs or pngs to jps.
      - I don't see where the figures are used? Also if you like I can make matcha versions
of them if you want.
      - I think you can cut a lot of copy in this page, try to trim any sections you don't
think are strictly neccesary to explain hints.
      - Replace the use of === with --- unless you want them to be top level sections.
   
   In doc/source/operating/hints.rst:
   > +.. with the License.  You may obtain a copy of the License at
   +..
   +..     http://www.apache.org/licenses/LICENSE-2.0
   +..
   +.. Unless required by applicable law or agreed to in writing, software
   +.. distributed under the License is distributed on an "AS IS" BASIS,
   +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   +.. See the License for the specific language governing permissions and
   +.. limitations under the License.
   +
   +.. highlight:: none
   +
   +Hints
   +=====
   +
   +Hints are a type of repair during a write operation. At times a write or an update cannot
be replicated to all nodes satisfying the replication factor because a replica node is unavailable.
Under such a condition the mutation (a write or update) is stored temporarily on the coordinator
node in its filesystem. 
   
   Suggested re-wording to something like the following:
   Hints are a data repair technique applied during write operations. When         
   replica nodes are unavailable to accept a mutation, either due to failure or    
   more commonly routine maintenance, coordinators attempting to write to those    
   replicas store temporary hints on their local filesystem for later application  
   to the unavailable replica. Hints are an important way to help reduce the       
   duration of data inconsistency between replicas as they replay quickly after    
   unavailable nodes return to the ring, however they are best effort and do not   
   guarantee eventual consistency like :ref:`anti-entropy repair <repair>` does. 
   
   In doc/source/operating/hints.rst:
   > +..     http://www.apache.org/licenses/LICENSE-2.0
   +..
   +.. Unless required by applicable law or agreed to in writing, software
   +.. distributed under the License is distributed on an "AS IS" BASIS,
   +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   +.. See the License for the specific language governing permissions and
   +.. limitations under the License.
   +
   +.. highlight:: none
   +
   +Hints
   +=====
   +
   +Hints are a type of repair during a write operation. At times a write or an update cannot
be replicated to all nodes satisfying the replication factor because a replica node is unavailable.
Under such a condition the mutation (a write or update) is stored temporarily on the coordinator
node in its filesystem. 
   +
   +Hints are metadata associated with a mutation (a write or update) indicating that the
mutation is not placed on a replica node (the target node) it is meant to be placed on because
the node is temporarily unavailable, or is unresponsive.  Hints are used to implement the
eventual consistency guarantee that all updates are eventually received by all replicas and
all replicas are eventually made consistent.    When the replica node becomes available the
hints are replayed on the node.
   
   I'd slightly modify
   Hints are used to implement the eventual consistency guarantee ...
   
   to be
   Hints are one of the primary ways Cassandra implements the eventual consistency guarantee
...
   
   In doc/source/operating/hints.rst:
   > +.. Unless required by applicable law or agreed to in writing, software
   +.. distributed under the License is distributed on an "AS IS" BASIS,
   +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   +.. See the License for the specific language governing permissions and
   +.. limitations under the License.
   +
   +.. highlight:: none
   +
   +Hints
   +=====
   +
   +Hints are a type of repair during a write operation. At times a write or an update cannot
be replicated to all nodes satisfying the replication factor because a replica node is unavailable.
Under such a condition the mutation (a write or update) is stored temporarily on the coordinator
node in its filesystem. 
   +
   +Hints are metadata associated with a mutation (a write or update) indicating that the
mutation is not placed on a replica node (the target node) it is meant to be placed on because
the node is temporarily unavailable, or is unresponsive.  Hints are used to implement the
eventual consistency guarantee that all updates are eventually received by all replicas and
all replicas are eventually made consistent.    When the replica node becomes available the
hints are replayed on the node.
   +
   +As a primer on how replicas are placed in a cluster, Apache Cassandra replicates data
to provide fault tolerance, high availability and durability. Cassandra partitions data across
the cluster using consistent hashing in which a hash function is used on the partition keys
to generate consistently ordered hash values (or tokens).  An abstract ring represents the
complete hash value range (token range) of the keys stored with each node in the cluster being
assigned a certain subset range of hash values (range of tokens) it can store.  The list of
nodes responsible for a particular key is called its preference list.  The preference list
may include virtual nodes as a virtual node is also a node albeit an abstract node and not
a physical node.  Virtual nodes may need to be skipped to create a preference list in which
the first N (N being the replication factor) nodes taken clockwise in the consistent hashing
ring are all distinct physical nodes. All nodes in a cluster know which node/s should be in
the preference list for a given key.  The node that receives a request for a write operation
(key/value data) forwards the request to the replica node that is in the preference list for
the key.  The node becomes a coordinator node and coordinates the reads and writes.   
   
   General feedback: Can we link to one of the architecture pages here instead of repeating
it?
   
   Copy feedback (my opinion):
      
      - I'd nix some of the expository copy like As a primer on how replicas are placed in
a cluster,
      - I don't think you need to go into virtual nodes to explain hints. There are a set
of physical endpoints which should be part of the replica set for a key, and when an endpoint
(or replica) is unavailable hints have to be stored for those.
   
   In doc/source/operating/hints.rst:
   > +=====
   +
   +Hints are a type of repair during a write operation. At times a write or an update cannot
be replicated to all nodes satisfying the replication factor because a replica node is unavailable.
Under such a condition the mutation (a write or update) is stored temporarily on the coordinator
node in its filesystem. 
   +
   +Hints are metadata associated with a mutation (a write or update) indicating that the
mutation is not placed on a replica node (the target node) it is meant to be placed on because
the node is temporarily unavailable, or is unresponsive.  Hints are used to implement the
eventual consistency guarantee that all updates are eventually received by all replicas and
all replicas are eventually made consistent.    When the replica node becomes available the
hints are replayed on the node.
   +
   +As a primer on how replicas are placed in a cluster, Apache Cassandra replicates data
to provide fault tolerance, high availability and durability. Cassandra partitions data across
the cluster using consistent hashing in which a hash function is used on the partition keys
to generate consistently ordered hash values (or tokens).  An abstract ring represents the
complete hash value range (token range) of the keys stored with each node in the cluster being
assigned a certain subset range of hash values (range of tokens) it can store.  The list of
nodes responsible for a particular key is called its preference list.  The preference list
may include virtual nodes as a virtual node is also a node albeit an abstract node and not
a physical node.  Virtual nodes may need to be skipped to create a preference list in which
the first N (N being the replication factor) nodes taken clockwise in the consistent hashing
ring are all distinct physical nodes. All nodes in a cluster know which node/s should be in
the preference list for a given key.  The node that receives a request for a write operation
(key/value data) forwards the request to the replica node that is in the preference list for
the key.  The node becomes a coordinator node and coordinates the reads and writes.   
   +
   +Why are hints needed?
   +=====================
   +
   +Hints reduce the inconsistency window caused by temporary node unavailability.
   +
   +Consider that an update or mutation is to be made using the following configuration:
   +
   +- Consistency level : 2
   
   Consistency level: LOCAL_QUORUM (2/3)
   
   In doc/source/operating/hints.rst:
   > +|nodetool resumehandoff      |Resumes hints delivery process             |      
                                                        
   ++----------------------------+-------------------------------------------+
   +|nodetool                    |Sets hinted handoff throttle in kb         |
   +|sethintedhandoffthrottlekb  |per second, per delivery thread            |           
                                                 
   ++----------------------------+-------------------------------------------+
   +|nodetool setmaxhintwindow   |Sets the specified max hint window in ms   | 
   ++----------------------------+-------------------------------------------+
   +|nodetool statushandoff      |Status of storing future hints on the      |
   +|                            |current node                               |
   ++----------------------------+-------------------------------------------+
   +|nodetool truncatehints      |Truncates all hints on the local node, or  |
   +|                            |truncates hints for the endpoint(s)        |
   +|                            |specified.                                 |
   ++----------------------------+-------------------------------------------+
   +
   +Hints is not an alternative to performing a full repair or read repair but is only a stopgap
measure.
   
   Hints, like read-repair, are not an alternative to performing full repair, but do help
reduce the duration of inconsistency between replicas
   
   In doc/source/operating/hints.rst:
   > +
   +Hints
   +=====
   +
   +Hints are a type of repair during a write operation. At times a write or an update cannot
be replicated to all nodes satisfying the replication factor because a replica node is unavailable.
Under such a condition the mutation (a write or update) is stored temporarily on the coordinator
node in its filesystem. 
   +
   +Hints are metadata associated with a mutation (a write or update) indicating that the
mutation is not placed on a replica node (the target node) it is meant to be placed on because
the node is temporarily unavailable, or is unresponsive.  Hints are used to implement the
eventual consistency guarantee that all updates are eventually received by all replicas and
all replicas are eventually made consistent.    When the replica node becomes available the
hints are replayed on the node.
   +
   +As a primer on how replicas are placed in a cluster, Apache Cassandra replicates data
to provide fault tolerance, high availability and durability. Cassandra partitions data across
the cluster using consistent hashing in which a hash function is used on the partition keys
to generate consistently ordered hash values (or tokens).  An abstract ring represents the
complete hash value range (token range) of the keys stored with each node in the cluster being
assigned a certain subset range of hash values (range of tokens) it can store.  The list of
nodes responsible for a particular key is called its preference list.  The preference list
may include virtual nodes as a virtual node is also a node albeit an abstract node and not
a physical node.  Virtual nodes may need to be skipped to create a preference list in which
the first N (N being the replication factor) nodes taken clockwise in the consistent hashing
ring are all distinct physical nodes. All nodes in a cluster know which node/s should be in
the preference list for a given key.  The node that receives a request for a write operation
(key/value data) forwards the request to the replica node that is in the preference list for
the key.  The node becomes a coordinator node and coordinates the reads and writes.   
   +
   +Why are hints needed?
   +=====================
   +
   +Hints reduce the inconsistency window caused by temporary node unavailability.
   +
   +Consider that an update or mutation is to be made using the following configuration:
   
   Consider that a mutation is made with the following configuration
   
   In doc/source/operating/hints.rst:
   > +
   +As a primer on how replicas are placed in a cluster, Apache Cassandra replicates data
to provide fault tolerance, high availability and durability. Cassandra partitions data across
the cluster using consistent hashing in which a hash function is used on the partition keys
to generate consistently ordered hash values (or tokens).  An abstract ring represents the
complete hash value range (token range) of the keys stored with each node in the cluster being
assigned a certain subset range of hash values (range of tokens) it can store.  The list of
nodes responsible for a particular key is called its preference list.  The preference list
may include virtual nodes as a virtual node is also a node albeit an abstract node and not
a physical node.  Virtual nodes may need to be skipped to create a preference list in which
the first N (N being the replication factor) nodes taken clockwise in the consistent hashing
ring are all distinct physical nodes. All nodes in a cluster know which node/s should be in
the preference list for a given key.  The node that receives a request for a write operation
(key/value data) forwards the request to the replica node that is in the preference list for
the key.  The node becomes a coordinator node and coordinates the reads and writes.   
   +
   +Why are hints needed?
   +=====================
   +
   +Hints reduce the inconsistency window caused by temporary node unavailability.
   +
   +Consider that an update or mutation is to be made using the following configuration:
   +
   +- Consistency level : 2
   +- Replication factor: 3
   +- Replication strategy: SimpleStrategy
   +- Number of nodes in cluster: 5
   +
   +The update or mutation is sent to a node (node A) in the cluster, and is meant to be forwarded
to three other nodes, the replica nodes B, C and D.  The node that receives the request is
the proxy node and becomes the coordinator of the request.  Under normal operation the update
gets sent to the three replica nodes and the coordinator receives the response from the three
nodes satisfying the consistency level.  But suppose node B is down and unavailable.  The
update is sent to nodes C and D and a response returned to the coordinator, again satisfying
the consistency level of 2.   But that is not the end of the request. Because the replica
mutation is meant for replica node B also, a hint is stored by the coordinator node in the
local filesystem   indicating that the update or mutation is also to be replicated on node
B.  The coordinator node waits for 3 hours by default (as set with ``max_hint_window_in_ms``).
If node B becomes available within 3 hours the coordinator sends the hint to node B and the
hint is replayed on node B, eventually making all replicas consistent. Such a transfer of
an update using hints is called a hinted handoff.  Hinted handoff is used to ensure that read
and write operations are not failed and the consistency, availability and durability guarantees
are not compromised.  We still need to satisfy the consistency level, because hints &
hinted handoffs are not used to satisfy the write consistency level unless the consistency
level is ``ANY``.  If the replica node for which a hint is generated does not become available
within 3 hours, or the ``max_hint_window_in_ms``, the hint is deleted and a full or read repair
becomes necessary.
   
   A couple of suggestions:
      
      - I'd omit replication strategy and number of nodes in the cluster for this example,
the only thing we need to know is that we have a LOCAL_QUORUM request going to three replicas
and one of the replicas does not acknowledge the write.
      - are not failed and the consistency, availability and durability guarantees are not
compromised. I suggest you re-word this to something like "hints ensure eventual consistency".
      - Can you structure this as an ordered timeline with a diagram instead of a large paragraph?
I think something like this diagram would help explain the concept.
   
   In doc/source/operating/hints.rst:
   > +
   +Hints reduce the inconsistency window caused by temporary node unavailability.
   +
   +Consider that an update or mutation is to be made using the following configuration:
   +
   +- Consistency level : 2
   +- Replication factor: 3
   +- Replication strategy: SimpleStrategy
   +- Number of nodes in cluster: 5
   +
   +The update or mutation is sent to a node (node A) in the cluster, and is meant to be forwarded
to three other nodes, the replica nodes B, C and D.  The node that receives the request is
the proxy node and becomes the coordinator of the request.  Under normal operation the update
gets sent to the three replica nodes and the coordinator receives the response from the three
nodes satisfying the consistency level.  But suppose node B is down and unavailable.  The
update is sent to nodes C and D and a response returned to the coordinator, again satisfying
the consistency level of 2.   But that is not the end of the request. Because the replica
mutation is meant for replica node B also, a hint is stored by the coordinator node in the
local filesystem   indicating that the update or mutation is also to be replicated on node
B.  The coordinator node waits for 3 hours by default (as set with ``max_hint_window_in_ms``).
If node B becomes available within 3 hours the coordinator sends the hint to node B and the
hint is replayed on node B, eventually making all replicas consistent. Such a transfer of
an update using hints is called a hinted handoff.  Hinted handoff is used to ensure that read
and write operations are not failed and the consistency, availability and durability guarantees
are not compromised.  We still need to satisfy the consistency level, because hints &
hinted handoffs are not used to satisfy the write consistency level unless the consistency
level is ``ANY``.  If the replica node for which a hint is generated does not become available
within 3 hours, or the ``max_hint_window_in_ms``, the hint is deleted and a full or read repair
becomes necessary.
   +
   +Hints for Timed Out Write Requests
   +==================================
   +
   +Hints are also stored for write requests that are timed out. The ``write_request_timeout_in_ms``
setting in ``cassandra.yaml`` configures the timeout for write requests.
   
   write requests that time out
   
   In doc/source/operating/hints.rst:
   > +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   +.. See the License for the specific language governing permissions and
   +.. limitations under the License.
   +
   +.. highlight:: none
   +
   +Hints
   +=====
   +
   +Hints are a type of repair during a write operation. At times a write or an update cannot
be replicated to all nodes satisfying the replication factor because a replica node is unavailable.
Under such a condition the mutation (a write or update) is stored temporarily on the coordinator
node in its filesystem. 
   +
   +Hints are metadata associated with a mutation (a write or update) indicating that the
mutation is not placed on a replica node (the target node) it is meant to be placed on because
the node is temporarily unavailable, or is unresponsive.  Hints are used to implement the
eventual consistency guarantee that all updates are eventually received by all replicas and
all replicas are eventually made consistent.    When the replica node becomes available the
hints are replayed on the node.
   +
   +As a primer on how replicas are placed in a cluster, Apache Cassandra replicates data
to provide fault tolerance, high availability and durability. Cassandra partitions data across
the cluster using consistent hashing in which a hash function is used on the partition keys
to generate consistently ordered hash values (or tokens).  An abstract ring represents the
complete hash value range (token range) of the keys stored with each node in the cluster being
assigned a certain subset range of hash values (range of tokens) it can store.  The list of
nodes responsible for a particular key is called its preference list.  The preference list
may include virtual nodes as a virtual node is also a node albeit an abstract node and not
a physical node.  Virtual nodes may need to be skipped to create a preference list in which
the first N (N being the replication factor) nodes taken clockwise in the consistent hashing
ring are all distinct physical nodes. All nodes in a cluster know which node/s should be in
the preference list for a given key.  The node that receives a request for a write operation
(key/value data) forwards the request to the replica node that is in the preference list for
the key.  The node becomes a coordinator node and coordinates the reads and writes.   
   +
   +Why are hints needed?
   
   Change this from = to - headers, and perhaps re-word the title to Hinted Handoff.
   
   In doc/source/operating/hints.rst:
   > +|                      |uncompressed. LZ4, Snappy, and Deflate     |            
    |
   +|                      |compressors are supported.                 |                 |
   ++----------------------+-------------------------------------------+-----------------+
   + 
   +Changing Max Hint Window at Runtime
   +===================================
   +
   +Cassandra 4.0 has added support for changing ``max_hint_window_in_ms`` at runtime 
   +(`CASSANDRA-11720
   +<https://issues.apache.org/jira/browse/CASSANDRA-11720>`_). The ``max_hint_window_in_ms``
configuration property in ``cassandra.yaml`` may be modified at runtime followed by a rolling
restart. The default value of ``max_hint_window_in_ms`` is 3 hours.
   +
   +::
   +
   +  max_hint_window_in_ms: 10800000 # 3 hours
   +
   +The need to be able to modify ``max_hint_window_in_ms`` at runtime is explained with the
following example.  A larger node (in terms of data it holds) goes down. And it will take
slightly more than ``max_hint_window_in_ms`` to fix it. The disk space to store some additional
hints id available.
   
   This is not clear, let's re-word it.
   
   In doc/source/operating/hints.rst:
   > +|                      |                                           |data/hints  
    |
   ++----------------------+-------------------------------------------+-----------------+
   +|hints_flush_period_in |How often hints should be flushed from the |  10000          |
   +|_ms                   |internal buffers to disk. Will *not*       |                 |
   +|                      |trigger fsync.                             |                 |
   ++----------------------+-------------------------------------------+-----------------+
   +|max_hints_file_size   |Maximum size for a single hints file, in   |   128           |
   +|_in_mb                |megabytes.                                 |                 |
   ++----------------------+-------------------------------------------+-----------------+
   +|hints_compression     |Compression to apply to the hint files.    |  LZ4Compress    |

   +|                      |If omitted, hints files will be written    |                 |
   +|                      |uncompressed. LZ4, Snappy, and Deflate     |                 |
   +|                      |compressors are supported.                 |                 |
   ++----------------------+-------------------------------------------+-----------------+
   + 
   +Changing Max Hint Window at Runtime
   
   Can we change this section to talk about when you may want more time for hints to play
instead of changing max hint window at runtime? It's actually somewhat rare for nodes to be
down for more than three hours but its very common for hints playing at 1024 kbps cannot complete
before 3 hours.
   
   You could mention raising the hinted_handoff_throttle as well as raising the window to
ensure hints are delivered.
   
   —
   You are receiving this because you authored the thread.
   Reply to this email directly, view it on GitHub, or unsubscribe.
     

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


Mime
View raw message