geode-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dharam Thacker <dharamthacke...@gmail.com>
Subject Re: Pulse is giving stale view of cluster -- lost updates
Date Thu, 21 Feb 2019 16:47:13 GMT
One more thing about GFSH.

gfsh>list regions
List of regions
---------------
GroupDefinition
Sample1
Sample2

gfsh>query --query="select * from /Sample1"
Result  : false
Message : Cannot find regions <[/Sample1]> in any of the members

Thanks,
- Dharam Thacker


On Thu, Feb 21, 2019 at 10:15 PM Dharam Thacker <dharamthacker88@gmail.com>
wrote:

> Hi Jens,
>
> I tried again as per your suggestions and traced logs as well again. You
> can see below that, both servers have joined but no regions created, no
> server counts changed and GFSH is behaving strangely as well.
>
> GFSH:
> gfsh>list members
>   Name   | Id
> -------- | ----------------------------------------------------------------
> locator1 | 192.168.31.62(locator1:2719:locator)<ec><v0>:41000 [Coordinator]
> locator2 | 192.168.31.62(locator2:2854:locator)<ec><v1>:41001
> S2       | 192.168.31.62(S2:5021)<v10>:41002
> S1       | 192.168.31.62(S1:5107)<v11>:41003
>
> gfsh>status server --name=S1
> No Geode Cache Server with member name or ID S1 could be found.
>
>
> gfsh>status server --name=S2
> No Geode Cache Server with member name or ID S2 could be found.
>
>
> XHR logs: No regions as well as only locators as members even though
> servers are there.
>
> [image: image.png]
>
> [image: image.png]
>
> I see below logs as well on startup in case they help.
> [debug 2019/02/21 22:04:33.839 IST <main> tid=0x1] Notification Region
> created with Name : _notificationRegion_192.168.31.62<v10>41002
>
> [debug 2019/02/21 22:04:34.113 IST <unicast receiver,dharam-thakkar-34169>
> tid=0x1b] sending via JGroups: [HeartbeatMessage [requestId=1]] recipients:
> [192.168.31.62(locator2:2854:locator)<ec><v1>:41001]
>
> [warn 2019/02/21 22:04:34.590 IST <main> tid=0x1] (tid=1 msgId=0) *Could
> not load Command from*: class
> org.apache.geode.management.internal.cli.commands.DestroyIndexCommand due
> to org.apache.geode.management.internal.cli.commands.DestroyIndexCommand
> cannot be cast to org.springframework.shell.core.CommandMarker
>
> [warn 2019/02/21 22:04:34.597 IST <main> tid=0x1] (tid=1 msgId=1) *Could
> not load Command from*: class
> org.apache.geode.management.internal.cli.commands.BackupDiskStoreCommand
> due to
> org.apache.geode.management.internal.cli.commands.BackupDiskStoreCommand
> cannot be cast to org.springframework.shell.core.CommandMarker
>
> [warn 2019/02/21 22:04:34.600 IST <main> tid=0x1] (tid=1 msgId=2) *Could
> not load Command from:* class
> org.apache.geode.management.internal.cli.commands.PDXRenameCommand due to
> org.apache.geode.management.internal.cli.commands.PDXRenameCommand cannot
> be cast to org.springframework.shell.core.CommandMarker
>
> [debug 2019/02/21 22:04:34.632 IST <main> tid=0x1] *Closing Management
> Service*
>
> Thanks,
> Dharam
>
>
> On Wed, Feb 20, 2019 at 10:53 AM Jens Deppe <jensdeppe@apache.org> wrote:
>
>> Hi Dharam,
>>
>> I've tried to replicate this, but have not been successful - I've tried
>> restarting my Spring Boot app server at least 20 times, but it always shows
>> up in Pulse.
>>
>> What would be useful is to try and look at the data that Pulse is
>> retrieving in order to update it's display. If you're using Chrome, can you
>> open the developer console and select the 'Network' tab. From there, select
>> the 'XHR' filter tab - that should show you a 'pulseUpdate'
>> request/response every 5 seconds. I'd be interested to see the data (it's a
>> JSON payload) that comes back when you have all the members in the view and
>> then the data that comes back when you are missing a member.
>>
>> Thanks
>> --Jens
>>
>> On Tue, Feb 19, 2019 at 2:47 PM Bruce Schuchardt <bschuchardt@pivotal.io>
>> wrote:
>>
>>> I can't comment on most of the content of your server1.log.  The
>>> java.net.SocketException doesn't seem to be causing any problems but an
>>> internet search indicated that setting
>>>
>>> -Djava.net.preferIPv4Stack=true
>>>
>>> might fix that problem for the machine you're using for testing.  This
>>> exception is caught and logged but shouldn't cause any other problems.
>>> Indeed, I can see from the debug-level logging that UDP messaging was
>>> working okay in your run.
>>>
>>>
>>> On 2/16/19 6:42 AM, Dharam Thacker wrote:
>>>
>>> Hi Team,
>>>
>>> I am sure about this issue now and it's really critical and worth to
>>> look at. I would really appreciate to address it in upcoming release as
>>> it's a BLOCKER for monitoring systems.
>>>
>>> I hope below one helps for your analysis. Please let me know if I can
>>> help with any more details for the same.
>>>
>>> Few quick glimpses
>>> On startup>>
>>> [debug 2019/02/16 19:41:45.642 IST <main> tid=0x1] Creating  Management
>>> Region :
>>>
>>> [debug 2019/02/16 19:41:45.680 IST <main> tid=0x1] Management Service is
>>> not initialised hence returning from handleLockServiceCreation
>>>
>>> [warn 2019/02/16 19:41:46.500 IST <main> tid=0x1] Could not initialize
>>> class org.apache.logging.log4j.util.PropertiesUtil
>>> java.lang.NoClassDefFoundError: Could not initialize class
>>> org.apache.logging.log4j.util.PropertiesUtil
>>>
>>> ...
>>>
>>> *System Specification : *
>>> DISTRIB_ID=LinuxMint
>>> DISTRIB_RELEASE=18.3
>>> DISTRIB_CODENAME=sylvia
>>> DISTRIB_DESCRIPTION="Linux Mint 18.3 Sylvia"
>>>
>>> *Java : *
>>> openjdk version "1.8.0_191"
>>> OpenJDK Runtime Environment (build
>>> 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12)
>>> OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
>>>
>>> *GEODE*: 1.8.0 *Spring-Data-Geode* : 2.1.4.RELEASE (Geode version
>>> overriden from 1.6.0 to 1.8.0)
>>>
>>> John,
>>> It's fully using spring-data-geode and worth looking at several issues
>>> related to that as well in server1.log
>>>
>>> The below link contains following artifacts for detailed analysis and
>>> re-generating issues,
>>>
>>> *Attachments:*
>>> https://drive.google.com/open?id=18AuPx05Aw-ezwNOKqdCfUJUwUycOqzTp
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__drive.google.com_open-3Fid-3D18AuPx05Aw-2DezwNOKqdCfUJUwUycOqzTp&d=DwMFaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=JEKigqAv3f2lWHmA02pq9MDT5naXLkEStB4d4n0NQmk&m=KXGpMQ3vCHLW9I1372frvIz29jAVik7VeZ19pSYqNjU&s=Encm7VMywtgfrNZoO_gucw4q4RwpZmlQ3xpowDLpiNY&e=>
>>>
>>> 1. I have attached both locator (locator1,locator2) logs & properties
>>> file
>>> *Commands:*
>>> start locator --name=locator1 --port=10334
>>> --properties-file=/home/apps/work/geode/locator1/locator.properties
>>> --dir=/home/apps/work/geode/locator1/work
>>>
>>> start locator --name=locator2 --port=10335
>>> --properties-file=/home/apps/work/geode/locator2/locator.properties
>>> --dir=/home/apps/work/geode/locator2/work
>>>
>>> 2. I have attached server1.log with debug level & demo.tar to regenerate
>>> the same issue
>>> *Command* : java -jar demo-0.0.1-SNAPSHOT.jar --demo.name
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__demo.name&d=DwMFaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=JEKigqAv3f2lWHmA02pq9MDT5naXLkEStB4d4n0NQmk&m=KXGpMQ3vCHLW9I1372frvIz29jAVik7VeZ19pSYqNjU&s=onawLLFvihcvBxkdkVHzB5jqnL6Cy1UmlVfSy1I7KMQ&e=>=S1
>>> --demo.port=40441 > server1.log &
>>>
>>> 3. Below is the pulse view where we can clearly say that, no more JMX
>>> notifications regarding region initialisation or cache server were recorded
>>>
>>> [image: image.png]
>>>
>>> Thanks,
>>> Dharam
>>>
>>>
>>> I have
>>> - Dharam Thacker
>>>
>>>
>>> On Tue, Feb 5, 2019 at 6:28 PM Thacker, Dharam <
>>> dharam.thacker@jpmorgan.com> wrote:
>>>
>>>> Hi Team,
>>>>
>>>>
>>>>
>>>> I have usually seen following sequence when new member joins the
>>>> cluster (member = cache-server)
>>>>
>>>>
>>>>
>>>> *JMX Notifications on pulse screen :*
>>>>
>>>>
>>>>
>>>> 1.       Member Joined <<SERVER_NAME>>
>>>>
>>>>
>>>>
>>>> 2.       Region Created With Name /<<REGION_NAME>>
>>>>
>>>>
>>>>
>>>> 3.       Cache Server is Started in the VM
>>>>
>>>>
>>>>
>>>> I am using GEODE 1.8.0  + Spring data geode 2.1.4.RELEASE with
>>>> following properties and pulse in embedded mode.
>>>>
>>>>
>>>>
>>>> *locator1.properties*
>>>>
>>>> locators=dharam-thakkar[10440],dharam-thakkar[10440]
>>>>
>>>> mcast-port=0
>>>>
>>>> jmx-manager=true
>>>>
>>>> jmx-manager-start=true
>>>>
>>>> jmx-manager-port=1091
>>>>
>>>> jmx-manager-ssl-enabled=false
>>>>
>>>> jmx-manager-bind-address=dharam-thakkar
>>>>
>>>> enable-network-partition-detection=false
>>>>
>>>> http-service-port=9701
>>>>
>>>> http-service-bind-address=dharam-thakkar
>>>>
>>>> log-file=/local/var/tmp/demo-locator1/locator1.log
>>>>
>>>> log-file-size-limit=10
>>>>
>>>> log-level=config
>>>>
>>>> log-disk-space-limit=50
>>>>
>>>>
>>>>
>>>> I tried below sequence and I see that PULSE is missing “JMX
>>>> Notifications” and gives incorrect view of cluster.
>>>>
>>>>
>>>>
>>>> *Steps to reproduce>>*
>>>>
>>>>
>>>>
>>>> 1.       gfsh start locator --name=demo-locator-1 --port=10440
>>>> --properties-file=locator1.properties --work-dir=/var/tmp/demo-locator1/work
>>>>
>>>>
>>>>
>>>> 2.       java -jar demo-spring-boot-geode-server.jar
>>>> -DserverName=demo-server1 -DserverPort=40440
>>>>
>>>>
>>>>
>>>> 3.       java -jar demo-spring-boot-geode-server.jar
>>>> -DserverName=demo-server2 -DserverPort=40441
>>>>
>>>>
>>>>
>>>> 4.       Everything will look fine as of now and you will see all
>>>> notifications as explained in above sequence
>>>>
>>>>
>>>>
>>>> 5.       PID=`ps auxwww | fgrep 'java' | fgrep 'demo-server-1' | awk
>>>> '{print $2}'` ; kill -INT $PID
>>>>
>>>>
>>>>
>>>> 6.       You should see *“Member Departed <<SERVER_NAME>>”*
message on
>>>> pulse
>>>>
>>>>
>>>>
>>>> 7.       Reboot the member -- java -jar
>>>> demo-spring-boot-geode-server.jar -DserverName=demo-server1
>>>> -DserverPort=40440
>>>>
>>>>
>>>>
>>>> 8.       Observe pulse notifications and member count
>>>>
>>>>
>>>>
>>>> 9.       You will only see *“Member Joined <<SERVER_NAME>>”
* message
>>>> on pulse and no update in member count
>>>>
>>>>
>>>>
>>>> 10.   If you don’t see situation as step-9, repeat steps-5 to steps-7
>>>> few times and you will end up in this situation
>>>>
>>>>
>>>>
>>>> *Note:* Please note that GFSH shows everything correctly but PULSE has
>>>> issues.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Dharam
>>>>
>>>> This message is confidential and subject to terms at: https://
>>>> www.jpmorgan.com/emaildisclaimer
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jpmorgan.com_emaildisclaimer&d=DwMFaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=JEKigqAv3f2lWHmA02pq9MDT5naXLkEStB4d4n0NQmk&m=KXGpMQ3vCHLW9I1372frvIz29jAVik7VeZ19pSYqNjU&s=wnMQ4KQ6EkepwerGG8L-HD4Bkb64Lv6lIQ77fjYolzs&e=>
>>>> including on confidentiality, legal privilege, viruses and monitoring of
>>>> electronic messages. If you are not the intended recipient, please delete
>>>> this message and notify the sender immediately. Any unauthorized use is
>>>> strictly prohibited.
>>>>
>>>

Mime
View raw message