ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Rakov <ivan.glu...@gmail.com>
Subject Re: Extended logging for rebalance performance analysis
Date Wed, 06 May 2020 16:50:36 GMT
Hi,

IGNITE_WRITE_REBALANCE_PARTITION_DISTRIBUTION_THRESHOLD - threshold
> duration rebalance of cache group after which partitions distribution is
> output, set in milliseconds, default value is 10 minutes.

 Does it mean that if the rebalancing process took less than 10 minutes,
only a short version of the message (with supplier statistics) will show up?

In general, I have no objections.


On Mon, May 4, 2020 at 10:38 AM ткаленко кирилл <tkalkirill@yandex.ru>
wrote:

> Hi, Igniters!
>
> I'd like to share a new small feature in AI [1].
>
> Current rebalance logging does not allow you to quickly answer following
> questions:
> 1)How long was the balance(divided by supplier)?
> 2)How many records and bytes per supplier were rebalanced?
> 3)How many times did rebalance restart?
> 4)Which partitions were rebalanced and from which nodes did they receive
> them?
> 5)When did rebalance for all cache groups end?
>
> What you can see in logs now:
>
> 1)Starting rebalance with order of cache groups.
> Rebalancing scheduled [order=[ignite-sys-cache, grp1, grp0],
> top=AffinityTopologyVersion [topVer=2, minorTopVer=0], force=false,
> evt=NODE_JOINED, node=c2146a04-dc23-4bc9-870d-dfbb55c00001]
>
> 2)Start rebalance of cache group from a specific supplier, specifying
> partition ids and mode - historical or full.
> Starting rebalance routine [ignite-sys-cache,
> topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0],
> supplier=8c525892-703b-4fc4-b28b-b2f139700000, fullPartitions=[0-99],
> histPartitions=[]]
>
> 3)Getting partial or complete partitions of cache group.
> Completed rebalancing [grp=ignite-sys-cache,
> supplier=8c525892-703b-4fc4-b28b-b2f139700000,
> topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], progress=1/2]
> Completed (final) rebalancing [grp=ignite-sys-cache,
> supplier=c2146a04-dc23-4bc9-870d-dfbb55c00001,
> topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], progress=2/2]
>
> 4)End rebalance of cache group.
> Completed rebalance future: RebalanceFuture [grp=CacheGroupContext
> [grp=ignite-sys-cache], topVer=AffinityTopologyVersion [topVer=2,
> minorTopVer=0], rebalanceId=1, routines=1, receivedBytes=1200,
> receivedKeys=0, partitionsLeft=0, startTime=1588519707607, endTime=-1,
> lastCancelledTime=-1]
>
> Rebalance statistics:
>
> To speed up rebalance analysis, statistics will be output for each cache
> group and total for all cache groups.
> If duration rebalance for cache group is greater than threshold value,
> partition distribution is output.
> Statistics will you to analyze duration of the balance for each supplier
> to understand which of them has been transmitting data for longest time.
>
> System properties are used to output statistics:
>
> IGNITE_QUIET - to output statistics, value must be false;
> IGNITE_WRITE_REBALANCE_PARTITION_DISTRIBUTION_THRESHOLD - threshold
> duration rebalance of cache group after which partitions distribution is
> output, set in milliseconds, default value is 10 minutes.
>
> Statistics examples:
>
> Successful full and historical rebalance of group cache, without
> partitions distribution.
> Rebalance information per cache group (successful rebalance): [id=3181548,
> name=grp1, startTime=2020-04-13 10:55:16,117, finishTime=2020-04-13
> 10:55:16,127, d=10 ms, restarted=0] Supplier statistics: [nodeId=0, p=5,
> d=10 ms] [nodeId=1, p=5, d=10 ms] Aliases: p - partitions, e - entries, b -
> bytes, d - duration, h - historical, nodeId mapping
> (nodeId=id,consistentId) [0=rebalancing.RebalanceStatisticsTest1]
> [1=rebalancing.RebalanceStatisticsTest0]
> Rebalance information per cache group (successful rebalance): [id=3181547,
> name=grp0, startTime=2020-04-13 15:01:44,000, finishTime=2020-04-13
> 15:01:44,116, d=116 ms, restarted=0] Supplier statistics: [nodeId=0, hp=10,
> he=300, hb=30267, d=116 ms] Aliases: p - partitions, e - entries, b -
> bytes, d - duration, h - historical, nodeId mapping
> (nodeId=id,consistentId) [0=rebalancing.RebalanceStatisticsTest0]
>
> Successful full and historical rebalance of group cache, with partitions
> distribution.
> Rebalance information per cache group (successful rebalance): [id=3181548,
> name=grp1, startTime=2020-04-13 10:55:16,117, finishTime=2020-04-13
> 10:55:16,127, d=10 ms, restarted=0] Supplier statistics: [nodeId=0, p=5,
> d=10 ms] [nodeId=1, p=5, d=10 ms] Aliases: p - partitions, e - entries, b -
> bytes, d - duration, h - historical, nodeId mapping
> (nodeId=id,consistentId) [0=rebalancing.RebalanceStatisticsTest1]
> [1=rebalancing.RebalanceStatisticsTest0] Rebalance duration was greater
> than 5 ms, printing detailed information about partitions distribution
> (threshold can be changed by setting number of milliseconds into
> IGNITE_WRITE_REBALANCE_PARTITION_DISTRIBUTION_THRESHOLD) 0 =
> [0,bu,su],[1,bu],[2,pr,su] 1 = [0,bu,su],[1,bu],[2,pr,su] 2 =
> [0,bu,su],[1,bu],[2,pr,su] 3 = [0,bu,su],[1,bu],[2,pr,su] 4 =
> [0,bu,su],[1,bu],[2,pr,su] 5 = [0,bu,su],[1,bu],[2,pr,su] 6 =
> [0,bu,su],[1,bu],[2,pr,su] 7 = [0,bu,su],[1,bu],[2,pr,su] 8 =
> [0,bu,su],[1,bu],[2,pr,su] 9 = [0,bu,su],[1,bu],[2,pr,su] Aliases: pr -
> primary, bu - backup, su - supplier node, h - historical, nodeId mapping
> (nodeId=id,consistentId) [0=rebalancing.RebalanceStatisticsTest1]
> [1=rebalancing.RebalanceStatisticsTest2]
> [2=rebalancing.RebalanceStatisticsTest0]
> Rebalance information per cache group (successful rebalance): [id=3181547,
> name=grp0, startTime=2020-04-13 15:01:44,000, finishTime=2020-04-13
> 15:01:44,116, d=116 ms, restarted=0] Supplier statistics: [nodeId=0, hp=10,
> he=300, hb=30267, d=116 ms] Aliases: p - partitions, e - entries, b -
> bytes, d - duration, h - historical, nodeId mapping
> (nodeId=id,consistentId) [0=rebalancing.RebalanceStatisticsTest0] Rebalance
> duration was greater than 100 ms, printing detailed information about
> partitions distribution (threshold can be changed by setting number of
> milliseconds into IGNITE_WRITE_REBALANCE_PARTITION_DISTRIBUTION_THRESHOLD)
> 0 = [0,pr,su],[1,bu],[2,bu] h 1 = [0,bu,su],[1,bu],[2,pr] h 2 =
> [0,pr,su],[1,bu],[2,bu] h 3 = [0,bu,su],[1,bu],[2,pr] h 4 =
> [0,pr,su],[1,bu],[2,bu] h 5 = [0,pr,su],[1,bu],[2,bu] h 6 =
> [0,bu,su],[1,bu],[2,pr] h 7 = [0,pr,su],[1,bu],[2,bu] h 8 =
> [0,pr,su],[1,bu],[2,bu] h 9 = [0,pr,su],[1,bu],[2,bu] h Aliases: pr -
> primary, bu - backup, su - supplier node, h - historical, nodeId mapping
> (nodeId=id,consistentId) [0=rebalancing.RebalanceStatisticsTest0]
> [1=rebalancing.RebalanceStatisticsTest2]
> [2=rebalancing.RebalanceStatisticsTest1]
>
> Interrupted rebalance of group cache.
> Rebalance information per cache group (interrupted rebalance):
> [id=644280849, name=default2, startTime=2020-04-13 14:55:24,969,
> finishTime=2020-04-13 14:55:24,969, d=0 ms, restarted=0]
>
> Total full and historical rebalance for all cache groups.
> Rebalance total information (including successful and not rebalances):
> [startTime=2020-04-13 10:55:18,726, finishTime=2020-04-13 10:55:18,780,
> d=54 ms] Supplier statistics: [nodeId=0, p=60, e=250, b=25000, d=54 ms]
> [nodeId=1, p=60, e=250, b=24945, d=54 ms] Aliases: p - partitions, e -
> entries, b - bytes, d - duration, h - historical, nodeId mapping
> (nodeId=id,consistentId) [0=rebalancing.RebalanceStatisticsTest1]
> [1=rebalancing.RebalanceStatisticsTest0]
> Rebalance total information (including successful and not rebalances):
> [startTime=2020-04-13 15:01:43,822, finishTime=2020-04-13 15:01:44,116,
> d=294 ms] Supplier statistics: [nodeId=0, hp=20, he=500, hb=50445, d=294
> ms] Aliases: p - partitions, e - entries, b - bytes, d - duration, h -
> historical, nodeId mapping (nodeId=id,consistentId)
> [0=rebalancing.RebalanceStatisticsTest0]
>
> [1] - https://issues.apache.org/jira/browse/IGNITE-12080

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message