cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shao-Chuan Wang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-6576) StreamMessage.deserialize takes high cpu and does not seem to make progress
Date Sun, 12 Jan 2014 03:18:51 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-6576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shao-Chuan Wang updated CASSANDRA-6576:
---------------------------------------

    Environment: 
AWS EC2 machines, 
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
Each node has 32 vnodes. 

  was:
AWS EC2 machines, 
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)


> StreamMessage.deserialize takes high cpu and does not seem to make progress
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6576
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6576
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: AWS EC2 machines, 
> java version "1.7.0_45"
> Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
> Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
> Each node has 32 vnodes. 
>            Reporter: Shao-Chuan Wang
>            Assignee: Yuki Morishita
>              Labels: streaming
>
> One of my machine seems to be stuck at streaming in the data.
> At node 10.97.135.32
> htop
> {code}
>   PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
> 32495 ubuntu     20   0 31.7G 13.9G  487M S 334. 46.7     146h /usr/lib/jvm/java-7-oracle/bin/java
-Dcassandra.ring_delay_ms=180000 -ea -
> 19396 ubuntu     20   0 31.7G 13.9G  487M S 74.0 46.7  9h20:34 /usr/lib/jvm/java-7-oracle/bin/java
-Dcassandra.ring_delay_ms=180000 -ea -
>  4199 ubuntu     20   0 31.7G 13.9G  487M R 72.0 46.7  9h20:06 /usr/lib/jvm/java-7-oracle/bin/java
-Dcassandra.ring_delay_ms=180000 -ea -
>  2737 ubuntu     20   0 31.7G 13.9G  487M R 69.0 46.7  2h27:54 /usr/lib/jvm/java-7-oracle/bin/java
-Dcassandra.ring_delay_ms=180000 -ea -
> 31304 ubuntu     20   0 31.7G 13.9G  487M S 63.0 46.7  3h00:42 /usr/lib/jvm/java-7-oracle/bin/java
-Dcassandra.ring_delay_ms=180000 -ea -
> {code}
> Jstack for the above "busy" threads:
> {code}
> "STREAM-IN-/10.122.50.31" daemon prio=10 tid=0x00007f4d480e5000 nid=0xab1 runnable [0x00007f4d94396000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.NativeThread.current(Native Method)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
>         - locked <0x000000058290b038> (a java.lang.Object)
>         - locked <0x0000000582908de0> (a java.lang.Object)
>         at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:50)
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:287)
>         at java.lang.Thread.run(Thread.java:744)
> --
> "STREAM-IN-/10.154.136.39" daemon prio=10 tid=0x00007f4a599e6000 nid=0x7a48 runnable
[0x00007f4d01700000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.NativeThread.current(Native Method)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
>         - locked <0x00000005830519a0> (a java.lang.Object)
>         - locked <0x000000058303f000> (a java.lang.Object)
>         at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:50)
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:287)
>         at java.lang.Thread.run(Thread.java:744)
> --
> "STREAM-IN-/10.44.183.111" daemon prio=10 tid=0x00007f4d480e0800 nid=0x4bc4 runnable
[0x00007f4d1167b000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.NativeThread.current(Native Method)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
>         - locked <0x0000000583b796f8> (a java.lang.Object)
>         - locked <0x0000000583b76598> (a java.lang.Object)
>         at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:50)
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:287)
>         at java.lang.Thread.run(Thread.java:744)
> --
> "STREAM-IN-/10.178.13.230" daemon prio=10 tid=0x00007f4a59a12000 nid=0x1067 runnable
[0x00007f4d23ca2000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.NativeThread.current(Native Method)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
>         - locked <0x0000000582908000> (a java.lang.Object)
>         - locked <0x0000000582906150> (a java.lang.Object)
>         at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:50)
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:287)
>         at java.lang.Thread.run(Thread.java:744)
> {code}
> After several hours, these above stacks look almost exactly the same, except some thread
scheduling.
> {code}
> "STREAM-IN-/10.122.50.31" daemon prio=10 tid=0x00007f4d480e5000 nid=0xab1 runnable [0x00007f4d94396000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>         - locked <0x0000000582908de0> (a java.lang.Object)
>         at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:50)
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:287)
>         at java.lang.Thread.run(Thread.java:744)
> --
> "STREAM-IN-/10.154.136.39" daemon prio=10 tid=0x00007f4a599e6000 nid=0x7a48 runnable
[0x00007f4d01700000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.NativeThread.current(Native Method)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
>         - locked <0x00000005830519a0> (a java.lang.Object)
>         - locked <0x000000058303f000> (a java.lang.Object)
>         at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:50)
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:287)
>         at java.lang.Thread.run(Thread.java:744)
> --
> "STREAM-IN-/10.44.183.111" daemon prio=10 tid=0x00007f4d480e0800 nid=0x4bc4 runnable
[0x00007f4d1167b000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.NativeThread.current(Native Method)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
>         - locked <0x0000000583b796f8> (a java.lang.Object)
>         - locked <0x0000000583b76598> (a java.lang.Object)
>         at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:50)
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:287)
>         at java.lang.Thread.run(Thread.java:744)
> --
> "STREAM-IN-/10.178.13.230" daemon prio=10 tid=0x00007f4a59a12000 nid=0x1067 runnable
[0x00007f4d23ca2000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.NativeThread.current(Native Method)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
>         - locked <0x0000000582908000> (a java.lang.Object)
>         - locked <0x0000000582906150> (a java.lang.Object)
>         at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:50)
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:287)
>         at java.lang.Thread.run(Thread.java:744)
> {code}
> If you login into the node that it is talking to, say 10.122.50.31
> {code}
> [ubuntu@ip-10-122-50-31 :~]# jstack 32276 | grep STREAM-OUT
> "STREAM-OUT-/10.95.128.6" daemon prio=10 tid=0x00007fed491c1800 nid=0x4323 waiting on
condition [0x00007fece6879000]
> "STREAM-OUT-/10.4.197.53" daemon prio=10 tid=0x00007fed49ba8800 nid=0x3e1a waiting on
condition [0x00007fed2cc4f000]
> "STREAM-OUT-/10.178.2.177" daemon prio=10 tid=0x00007fed49587800 nid=0x2bc6 waiting on
condition [0x00007fece86f1000]
> {code}
> It is NOT talking to 10.97.135.32, which makes me suspect the 10.97.135.32 is somehow
in a "busy-wait" state. Thoughts?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message