geode-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vahram Aharonyan <vaharon...@vmware.com>
Subject RE: Some Geode management metrics returning 0s after OS upgrade
Date Tue, 12 Feb 2019 14:42:59 GMT
Hi All,

Experiments with various experiments and long-term monitoring showed that the only real problem
remains only with these 3 metrics:

org.apache.geode.management.MemberMXBean#getFunctionExecutionRate
org.apache.geode.management.MemberMXBean#getPutsRate
org.apache.geode.management.MemberMXBean#getGetsRate

All others related to either Network or Disk have some values differing from 0, but these
three constantly have 0-values. These seem to be Geode-internal metrics and should not be
related to system right? Could it be that there is some info on these metrics in *.gfs files,
so we can see whether they have actual values or not?

Thanks,
Vahram.

From: Vahram Aharonyan <vaharonyan@vmware.com>
Sent: Thursday, February 7, 2019 5:19 PM
To: user@geode.apache.org
Subject: RE: Some Geode management metrics returning 0s after OS upgrade

Hi Kirk,

We were not able to find any erroneous message from StatsSampler in our log files.
Is running of these tests straightforward, do we have some doc describing this process? What
kind of requirements should be met to be able to run this test?

Hi Barry,

Yes, we see values for other MBean attributes reported.

You were right, thread is there:
INFO   | jvm 1    | 2019/02/07 12:15:54 | "Thread-10 StatSampler" #59 daemon prio=10 os_prio=0
tid=0x00007f1fc8951800 nid=0x2d0 in Object.wait() [0x00007f1fb14e3000]
INFO   | jvm 1    | 2019/02/07 12:15:54 |    java.lang.Thread.State: TIMED_WAITING (on object
monitor)
INFO   | jvm 1    | 2019/02/07 12:15:54 |       at java.lang.Object.wait(Native Method)
INFO   | jvm 1    | 2019/02/07 12:15:54 |       at org.apache.geode.internal.statistics.HostStatSampler.delay(HostStatSampler.java:520)
INFO   | jvm 1    | 2019/02/07 12:15:54 |       - locked <0x0000000651581a68> (a org.apache.geode.internal.statistics.GemFireStatSampler)
INFO   | jvm 1    | 2019/02/07 12:15:54 |       at org.apache.geode.internal.statistics.HostStatSampler.run(HostStatSampler.java:208)
INFO   | jvm 1    | 2019/02/07 12:15:54 |       at java.lang.Thread.run(Thread.java:748)

Could it be that this is caused by missing some privileges to access system resources ? Or
is there some way to check if this information is available in the *.gfs stat files from locator
or server? I was looking into these files but was not able to find anything linking me with
below-mentioned metrics.

Thanks,
Vahram.

From: Barry Oglesby <boglesby@pivotal.io<mailto:boglesby@pivotal.io>>
Sent: Wednesday, February 6, 2019 11:21 PM
To: user@geode.apache.org<mailto:user@geode.apache.org>
Subject: Re: Some Geode management metrics returning 0s after OS upgrade

Do you see values for other MBean attributes?

If you do a thread dump in your server JVM(s), you should see a thread like this running:

"StatSampler" #39 daemon prio=10 os_prio=31 tid=0x00007fdcbf004000 nid=0x7003 in Object.wait()
[0x000070000c50a000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
                at java.lang.Object.wait(Native Method)
                at org.apache.geode.internal.statistics.HostStatSampler.delay(HostStatSampler.java:519)
                - locked <0x00000007a8911160> (a org.apache.geode.internal.statistics.GemFireStatSampler)
                at org.apache.geode.internal.statistics.HostStatSampler.run(HostStatSampler.java:219)
                at java.lang.Thread.run(Thread.java:745)



On Wed, Feb 6, 2019 at 9:40 AM Kirk Lund <klund@apache.org<mailto:klund@apache.org>>
wrote:
Phantom OS might have caused the StatSampler to fail or even crash. That's the only explanation
I can think of that might result in the non-OS related stats remaining zero. You might want
to look through the log to see if the StatSampler logged any problems. Other than that, you
could try running every statistic related test/integrationTest/distributedTest in Geode on
Phantom OS to see how the tests behave.

On Wed, Feb 6, 2019 at 7:49 AM Anthony Baker <abaker@pivotal.io<mailto:abaker@pivotal.io>>
wrote:
I wouldn’t be surprised if other OS -related things are broken on Phantom OS as well.  We
use JNA for most native calls.  Look at `git grep Native.register` to see what posix-like
things might be affected.

Anthony


On Feb 6, 2019, at 7:28 AM, Jacob Barrett <jbarrett@pivotal.io<mailto:jbarrett@pivotal.io>>
wrote:

We don’t have any hooks into the stats for this OS.

On Feb 6, 2019, at 7:16 AM, Jens Deppe <jdeppe@pivotal.io<mailto:jdeppe@pivotal.io>>
wrote:
From SLES 11 to Phantom OS

(I had already asked asked, but my CC got scrambled :( )

On Wed, Feb 6, 2019 at 7:10 AM Anthony Baker <abaker@pivotal.io<mailto:abaker@pivotal.io>>
wrote:
Which OS did you upgrade to?

Anthony

On Feb 6, 2019, at 1:25 AM, Vahram Aharonyan <vaharonyan@vmware.com<mailto:vaharonyan@vmware.com>>
wrote:

Hi All,

For our troubleshooting purposes we have been collecting some data from Geode cluster members
using following APIs:

org.apache.geode.management.MemberMXBean#getFunctionExecutionRate
org.apache.geode.management.MemberMXBean#getPutsRate
org.apache.geode.management.MemberMXBean#getGetsRate

org.apache.geode.management.NetworkMetrics#getBytesReceivedRate
org.apache.geode.management.NetworkMetrics#getBytesSentRate

org.apache.geode.management.DiskMetrics#getDiskFlushAvgLatency
org.apache.geode.management.DiskMetrics#getDiskReadsRate
org.apache.geode.management.DiskMetrics#getDiskWritesRate

Recently we have replaced our base OS and all the values reported back by Geode during this
calls become 0s.
Could someone help us to understand how these metrics are being collected by Geode? Could
it be that Geode uses some system utilities or system calls that existed in our previous appliance
and are removed in our newer version of system causing Geode returning only 0s.

Thanks,
Vahram.


Mime
View raw message