cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kurt Greaves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state
Date Mon, 02 Jul 2018 08:23:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529525#comment-16529525
] 

Kurt Greaves commented on CASSANDRA-14525:
------------------------------------------

{quote}Also I've discovered another bug exists in current open source code in which if isSurveyMode
is true and streaming fails (i.e. isBootstrapMode is true) then also one can call nodetool
join without nodetool bootstrap resume and have that node join the ring.
{quote}
Great catch. I found a couple more small issue w.r.t {{nodetool join}} as well while I was
testing this.
 # If in write_survey and you join the ring after bootstrap, transports won't be enabled.
can we call {{CassandraDaemon#start()}} here?
 # nodetool join fails silently if write_survey is true and we haven't completed bootstrapping,
but server log prints the following
{code:java}
WARN [RMI TCP Connection(5)-127.0.0.1] 2018-06-29 12:39:49,735 StorageService.java:1008 -
Some data streaming failed. Use nodetool to check bootstrap state and resume. For more, see
`nodetool help bootstrap`. IN_PROGRESS
{code}
nodetool join should say something along the lines of "{{Can't join the ring because in write_survey
mode and bootstrap hasn't completed}}"

Also another minor nit w.r.t logging; you can get the following log message after successfully
bootstrapping if you were in write survey mode:
{code:java}
INFO [main] 2018-06-29 12:12:39,071 CassandraDaemon.java:479 - Not starting client transports
as bootstrap has not completed
{code}
Probably better to split CassandraDaemon.start() if block so that we print "{{Not starting
client transports as write_survey mode is enabled.}}"

And finally, there's still 2 occurences of "bootstraped" in the exception messages in {{startNativeTransport}}
and {{startRPCServer}}.

> streaming failure during bootstrap makes new node into inconsistent state
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14525
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14525
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jaydeepkumar Chovatia
>            Assignee: Jaydeepkumar Chovatia
>            Priority: Major
>             Fix For: 4.0, 2.2.x, 3.0.x
>
>
> If bootstrap fails for newly joining node (most common reason is due to streaming failure)
then Cassandra state remains in {{joining}} state which is fine but Cassandra also enables
Native transport which makes overall state inconsistent. This further creates NullPointer
exception if auth is enabled on the new node, please find reproducible steps here:
> For example if bootstrap fails due to streaming errors like
> {quote}java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException:
Stream failed
>  at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-18.0.jar:na]
>  at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894)
[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) [apache-cassandra-3.0.16.jar:3.0.16]
>  Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
>  at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) ~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
~[guava-18.0.jar:na]
>  at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
> {quote}
> then variable [StorageService.java::dataAvailable |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892]
will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not call [StorageService.java::finishJoiningRing
|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933]
and as a result [StorageService.java::doAuthSetup|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L999]
will not be invoked.
> API [StorageService.java::joinTokenRing |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L763]
returns without any problem. After this [CassandraDaemon.java::start|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/CassandraDaemon.java#L584]
is invoked which starts native transport at 
>  [CassandraDaemon.java::startNativeTransport |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/CassandraDaemon.java#L478]
> At this point daemon’s bootstrap is still not finished and transport is enabled. So
client will connect to the node and will encounter {{java.lang.NullPointerException}} as following:
> {quote}ERROR [SharedPool-Worker-2] Message.java:647 - Unexpected exception during request;
channel = [id: 0x412a26b3, L:/a.b.c.d:9042 - R:/p.q.r.s:20121]
>  java.lang.NullPointerException: null
>  at org.apache.cassandra.auth.PasswordAuthenticator.doAuthenticate(PasswordAuthenticator.java:160)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:82)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.auth.PasswordAuthenticator.access$100(PasswordAuthenticator.java:54)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:198)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:78)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:535)
[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:429)
[apache-cassandra-3.0.16.jar:3.0.16]
>  at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
[netty-all-4.1.0.CR6.jar:4.1.0.CR6]
>  at io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:83)
[netty-all-4.1.0.CR6.jar:4.1.0.CR6]
>  at io.netty.channel.DefaultChannelHandlerInvoker$7.run(DefaultChannelHandlerInvoker.java:159)
[netty-all-4.1.0.CR6.jar:4.1.0.CR6]
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_121]
>  at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {quote}
> At this point if we run {{nodetool status}} then it will show this new node in {{UJ}}
state, however clients can connect to this node over {{CQL}} and will receive {{java.lang.NullPointerException}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message