cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kurt Greaves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state
Date Tue, 19 Jun 2018 05:21:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516669#comment-16516669
] 

Kurt Greaves commented on CASSANDRA-14525:
------------------------------------------

Thanks [~chovatia.jaydeep@gmail.com], I agree. No fault on your part, more a problem with
the consistent lack of reviewers we have who can prioritise review work. Just unfortunate
that it's wasted more time than necessary for everyone. I think [~VincentWhite] would appreciate
the acknowledgement (especially after such a long time) so FCFS makes sense to me, but there's
no use doing the work twice, just take into account the two patches slight discrepancies when
reviewing I guess.

> streaming failure during bootstrap makes new node into inconsistent state
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14525
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14525
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jaydeepkumar Chovatia
>            Assignee: Jaydeepkumar Chovatia
>            Priority: Major
>             Fix For: 4.0, 2.2.x, 3.0.x
>
>
> If bootstrap fails for newly joining node (most common reason is due to streaming failure)
then Cassandra state remains in {{joining}} state which is fine but Cassandra also enables
Native transport which makes overall state inconsistent. This further creates NullPointer
exception if auth is enabled on the new node, please find reproducible steps here:
> For example if bootstrap fails due to streaming errors like
> {quote}java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException:
Stream failed
>  at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-18.0.jar:na]
>  at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894)
[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) [apache-cassandra-3.0.16.jar:3.0.16]
>  Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
>  at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) ~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
~[guava-18.0.jar:na]
>  at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
> {quote}
> then variable [StorageService.java::dataAvailable |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892]
will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not call [StorageService.java::finishJoiningRing
|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933]
and as a result [StorageService.java::doAuthSetup|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L999]
will not be invoked.
> API [StorageService.java::joinTokenRing |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L763]
returns without any problem. After this [CassandraDaemon.java::start|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/CassandraDaemon.java#L584]
is invoked which starts native transport at 
>  [CassandraDaemon.java::startNativeTransport |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/CassandraDaemon.java#L478]
> At this point daemon’s bootstrap is still not finished and transport is enabled. So
client will connect to the node and will encounter {{java.lang.NullPointerException}} as following:
> {quote}ERROR [SharedPool-Worker-2] Message.java:647 - Unexpected exception during request;
channel = [id: 0x412a26b3, L:/a.b.c.d:9042 - R:/p.q.r.s:20121]
>  java.lang.NullPointerException: null
>  at org.apache.cassandra.auth.PasswordAuthenticator.doAuthenticate(PasswordAuthenticator.java:160)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:82)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.auth.PasswordAuthenticator.access$100(PasswordAuthenticator.java:54)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:198)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:78)
~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:535)
[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:429)
[apache-cassandra-3.0.16.jar:3.0.16]
>  at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
[netty-all-4.1.0.CR6.jar:4.1.0.CR6]
>  at io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:83)
[netty-all-4.1.0.CR6.jar:4.1.0.CR6]
>  at io.netty.channel.DefaultChannelHandlerInvoker$7.run(DefaultChannelHandlerInvoker.java:159)
[netty-all-4.1.0.CR6.jar:4.1.0.CR6]
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_121]
>  at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {quote}
> At this point if we run {{nodetool status}} then it will show this new node in {{UJ}}
state, however clients can connect to this node over {{CQL}} and will receive {{java.lang.NullPointerException}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message