geode-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dharam Thacker <dharamthacke...@gmail.com>
Subject Re: Pulse is giving stale view of cluster -- lost updates
Date Thu, 21 Feb 2019 16:45:28 GMT
Hi Jens,

I tried again as per your suggestions and traced logs as well again. You
can see below that, both servers have joined but no regions created, no
server counts changed and GFSH is behaving strangely as well.

GFSH:
gfsh>list members
  Name   | Id
-------- | ----------------------------------------------------------------
locator1 | 192.168.31.62(locator1:2719:locator)<ec><v0>:41000 [Coordinator]
locator2 | 192.168.31.62(locator2:2854:locator)<ec><v1>:41001
S2       | 192.168.31.62(S2:5021)<v10>:41002
S1       | 192.168.31.62(S1:5107)<v11>:41003

gfsh>status server --name=S1
No Geode Cache Server with member name or ID S1 could be found.


gfsh>status server --name=S2
No Geode Cache Server with member name or ID S2 could be found.


XHR logs: No regions as well as only locators as members even though
servers are there.

[image: image.png]

[image: image.png]

I see below logs as well on startup in case they help.
[debug 2019/02/21 22:04:33.839 IST <main> tid=0x1] Notification Region
created with Name : _notificationRegion_192.168.31.62<v10>41002

[debug 2019/02/21 22:04:34.113 IST <unicast receiver,dharam-thakkar-34169>
tid=0x1b] sending via JGroups: [HeartbeatMessage [requestId=1]] recipients:
[192.168.31.62(locator2:2854:locator)<ec><v1>:41001]

[warn 2019/02/21 22:04:34.590 IST <main> tid=0x1] (tid=1 msgId=0) *Could
not load Command from*: class
org.apache.geode.management.internal.cli.commands.DestroyIndexCommand due
to org.apache.geode.management.internal.cli.commands.DestroyIndexCommand
cannot be cast to org.springframework.shell.core.CommandMarker

[warn 2019/02/21 22:04:34.597 IST <main> tid=0x1] (tid=1 msgId=1) *Could
not load Command from*: class
org.apache.geode.management.internal.cli.commands.BackupDiskStoreCommand
due to
org.apache.geode.management.internal.cli.commands.BackupDiskStoreCommand
cannot be cast to org.springframework.shell.core.CommandMarker

[warn 2019/02/21 22:04:34.600 IST <main> tid=0x1] (tid=1 msgId=2) *Could
not load Command from:* class
org.apache.geode.management.internal.cli.commands.PDXRenameCommand due to
org.apache.geode.management.internal.cli.commands.PDXRenameCommand cannot
be cast to org.springframework.shell.core.CommandMarker

[debug 2019/02/21 22:04:34.632 IST <main> tid=0x1] *Closing Management
Service*

Thanks,
Dharam


On Wed, Feb 20, 2019 at 10:53 AM Jens Deppe <jensdeppe@apache.org> wrote:

> Hi Dharam,
>
> I've tried to replicate this, but have not been successful - I've tried
> restarting my Spring Boot app server at least 20 times, but it always shows
> up in Pulse.
>
> What would be useful is to try and look at the data that Pulse is
> retrieving in order to update it's display. If you're using Chrome, can you
> open the developer console and select the 'Network' tab. From there, select
> the 'XHR' filter tab - that should show you a 'pulseUpdate'
> request/response every 5 seconds. I'd be interested to see the data (it's a
> JSON payload) that comes back when you have all the members in the view and
> then the data that comes back when you are missing a member.
>
> Thanks
> --Jens
>
> On Tue, Feb 19, 2019 at 2:47 PM Bruce Schuchardt <bschuchardt@pivotal.io>
> wrote:
>
>> I can't comment on most of the content of your server1.log.  The
>> java.net.SocketException doesn't seem to be causing any problems but an
>> internet search indicated that setting
>>
>> -Djava.net.preferIPv4Stack=true
>>
>> might fix that problem for the machine you're using for testing.  This
>> exception is caught and logged but shouldn't cause any other problems.
>> Indeed, I can see from the debug-level logging that UDP messaging was
>> working okay in your run.
>>
>>
>> On 2/16/19 6:42 AM, Dharam Thacker wrote:
>>
>> Hi Team,
>>
>> I am sure about this issue now and it's really critical and worth to look
>> at. I would really appreciate to address it in upcoming release as it's a
>> BLOCKER for monitoring systems.
>>
>> I hope below one helps for your analysis. Please let me know if I can
>> help with any more details for the same.
>>
>> Few quick glimpses
>> On startup>>
>> [debug 2019/02/16 19:41:45.642 IST <main> tid=0x1] Creating  Management
>> Region :
>>
>> [debug 2019/02/16 19:41:45.680 IST <main> tid=0x1] Management Service is
>> not initialised hence returning from handleLockServiceCreation
>>
>> [warn 2019/02/16 19:41:46.500 IST <main> tid=0x1] Could not initialize
>> class org.apache.logging.log4j.util.PropertiesUtil
>> java.lang.NoClassDefFoundError: Could not initialize class
>> org.apache.logging.log4j.util.PropertiesUtil
>>
>> ...
>>
>> *System Specification : *
>> DISTRIB_ID=LinuxMint
>> DISTRIB_RELEASE=18.3
>> DISTRIB_CODENAME=sylvia
>> DISTRIB_DESCRIPTION="Linux Mint 18.3 Sylvia"
>>
>> *Java : *
>> openjdk version "1.8.0_191"
>> OpenJDK Runtime Environment (build
>> 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12)
>> OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
>>
>> *GEODE*: 1.8.0 *Spring-Data-Geode* : 2.1.4.RELEASE (Geode version
>> overriden from 1.6.0 to 1.8.0)
>>
>> John,
>> It's fully using spring-data-geode and worth looking at several issues
>> related to that as well in server1.log
>>
>> The below link contains following artifacts for detailed analysis and
>> re-generating issues,
>>
>> *Attachments:*
>> https://drive.google.com/open?id=18AuPx05Aw-ezwNOKqdCfUJUwUycOqzTp
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__drive.google.com_open-3Fid-3D18AuPx05Aw-2DezwNOKqdCfUJUwUycOqzTp&d=DwMFaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=JEKigqAv3f2lWHmA02pq9MDT5naXLkEStB4d4n0NQmk&m=KXGpMQ3vCHLW9I1372frvIz29jAVik7VeZ19pSYqNjU&s=Encm7VMywtgfrNZoO_gucw4q4RwpZmlQ3xpowDLpiNY&e=>
>>
>> 1. I have attached both locator (locator1,locator2) logs & properties file
>> *Commands:*
>> start locator --name=locator1 --port=10334
>> --properties-file=/home/apps/work/geode/locator1/locator.properties
>> --dir=/home/apps/work/geode/locator1/work
>>
>> start locator --name=locator2 --port=10335
>> --properties-file=/home/apps/work/geode/locator2/locator.properties
>> --dir=/home/apps/work/geode/locator2/work
>>
>> 2. I have attached server1.log with debug level & demo.tar to regenerate
>> the same issue
>> *Command* : java -jar demo-0.0.1-SNAPSHOT.jar --demo.name
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__demo.name&d=DwMFaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=JEKigqAv3f2lWHmA02pq9MDT5naXLkEStB4d4n0NQmk&m=KXGpMQ3vCHLW9I1372frvIz29jAVik7VeZ19pSYqNjU&s=onawLLFvihcvBxkdkVHzB5jqnL6Cy1UmlVfSy1I7KMQ&e=>=S1
>> --demo.port=40441 > server1.log &
>>
>> 3. Below is the pulse view where we can clearly say that, no more JMX
>> notifications regarding region initialisation or cache server were recorded
>>
>> [image: image.png]
>>
>> Thanks,
>> Dharam
>>
>>
>> I have
>> - Dharam Thacker
>>
>>
>> On Tue, Feb 5, 2019 at 6:28 PM Thacker, Dharam <
>> dharam.thacker@jpmorgan.com> wrote:
>>
>>> Hi Team,
>>>
>>>
>>>
>>> I have usually seen following sequence when new member joins the cluster
>>> (member = cache-server)
>>>
>>>
>>>
>>> *JMX Notifications on pulse screen :*
>>>
>>>
>>>
>>> 1.       Member Joined <<SERVER_NAME>>
>>>
>>>
>>>
>>> 2.       Region Created With Name /<<REGION_NAME>>
>>>
>>>
>>>
>>> 3.       Cache Server is Started in the VM
>>>
>>>
>>>
>>> I am using GEODE 1.8.0  + Spring data geode 2.1.4.RELEASE with following
>>> properties and pulse in embedded mode.
>>>
>>>
>>>
>>> *locator1.properties*
>>>
>>> locators=dharam-thakkar[10440],dharam-thakkar[10440]
>>>
>>> mcast-port=0
>>>
>>> jmx-manager=true
>>>
>>> jmx-manager-start=true
>>>
>>> jmx-manager-port=1091
>>>
>>> jmx-manager-ssl-enabled=false
>>>
>>> jmx-manager-bind-address=dharam-thakkar
>>>
>>> enable-network-partition-detection=false
>>>
>>> http-service-port=9701
>>>
>>> http-service-bind-address=dharam-thakkar
>>>
>>> log-file=/local/var/tmp/demo-locator1/locator1.log
>>>
>>> log-file-size-limit=10
>>>
>>> log-level=config
>>>
>>> log-disk-space-limit=50
>>>
>>>
>>>
>>> I tried below sequence and I see that PULSE is missing “JMX
>>> Notifications” and gives incorrect view of cluster.
>>>
>>>
>>>
>>> *Steps to reproduce>>*
>>>
>>>
>>>
>>> 1.       gfsh start locator --name=demo-locator-1 --port=10440
>>> --properties-file=locator1.properties --work-dir=/var/tmp/demo-locator1/work
>>>
>>>
>>>
>>> 2.       java -jar demo-spring-boot-geode-server.jar
>>> -DserverName=demo-server1 -DserverPort=40440
>>>
>>>
>>>
>>> 3.       java -jar demo-spring-boot-geode-server.jar
>>> -DserverName=demo-server2 -DserverPort=40441
>>>
>>>
>>>
>>> 4.       Everything will look fine as of now and you will see all
>>> notifications as explained in above sequence
>>>
>>>
>>>
>>> 5.       PID=`ps auxwww | fgrep 'java' | fgrep 'demo-server-1' | awk
>>> '{print $2}'` ; kill -INT $PID
>>>
>>>
>>>
>>> 6.       You should see *“Member Departed <<SERVER_NAME>>”* message
on
>>> pulse
>>>
>>>
>>>
>>> 7.       Reboot the member -- java -jar
>>> demo-spring-boot-geode-server.jar -DserverName=demo-server1
>>> -DserverPort=40440
>>>
>>>
>>>
>>> 8.       Observe pulse notifications and member count
>>>
>>>
>>>
>>> 9.       You will only see *“Member Joined <<SERVER_NAME>>” *
message
>>> on pulse and no update in member count
>>>
>>>
>>>
>>> 10.   If you don’t see situation as step-9, repeat steps-5 to steps-7
>>> few times and you will end up in this situation
>>>
>>>
>>>
>>> *Note:* Please note that GFSH shows everything correctly but PULSE has
>>> issues.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Dharam
>>>
>>> This message is confidential and subject to terms at: https://
>>> www.jpmorgan.com/emaildisclaimer
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jpmorgan.com_emaildisclaimer&d=DwMFaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=JEKigqAv3f2lWHmA02pq9MDT5naXLkEStB4d4n0NQmk&m=KXGpMQ3vCHLW9I1372frvIz29jAVik7VeZ19pSYqNjU&s=wnMQ4KQ6EkepwerGG8L-HD4Bkb64Lv6lIQ77fjYolzs&e=>
>>> including on confidentiality, legal privilege, viruses and monitoring of
>>> electronic messages. If you are not the intended recipient, please delete
>>> this message and notify the sender immediately. Any unauthorized use is
>>> strictly prohibited.
>>>
>>

Mime
View raw message