flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-10252) Handle oversized metric messges
Date Fri, 02 Nov 2018 08:22:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672749#comment-16672749
] 

ASF GitHub Bot commented on FLINK-10252:
----------------------------------------

zentol commented on a change in pull request #6850: [FLINK-10252] Handle oversized metric
messges
URL: https://github.com/apache/flink/pull/6850#discussion_r230297487
 
 

 ##########
 File path: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/dump/MetricDumpSerialization.java
 ##########
 @@ -124,55 +160,135 @@ public MetricSerializationResult serialize(
 			Map<Counter, Tuple2<QueryScopeInfo, String>> counters,
 			Map<Gauge<?>, Tuple2<QueryScopeInfo, String>> gauges,
 			Map<Histogram, Tuple2<QueryScopeInfo, String>> histograms,
-			Map<Meter, Tuple2<QueryScopeInfo, String>> meters) {
+			Map<Meter, Tuple2<QueryScopeInfo, String>> meters,
+			long maximumFramesize) {
+
+			boolean markUnserializedMetrics = false;
 
-			buffer.clear();
+			Map<Counter, Tuple2<QueryScopeInfo, String>> unserializedCounters = new HashMap<>();
+			Map<Gauge<?>, Tuple2<QueryScopeInfo, String>> unserializedGauges = new
HashMap<>();
+			Map<Histogram, Tuple2<QueryScopeInfo, String>> unserializedHistograms = new
HashMap<>();
+			Map<Meter, Tuple2<QueryScopeInfo, String>> unserializedMeters = new HashMap<>();
 
+			countersBuffer.clear();
 			int numCounters = 0;
 			for (Map.Entry<Counter, Tuple2<QueryScopeInfo, String>> entry : counters.entrySet())
{
+				if (markUnserializedMetrics) {
+					unserializedCounters.put(entry.getKey(), entry.getValue());
+					continue;
+				}
+
 				try {
-					serializeCounter(buffer, entry.getValue().f0, entry.getValue().f1, entry.getKey());
+					serializeCounter(countersBuffer, entry.getValue().f0, entry.getValue().f1, entry.getKey());
 					numCounters++;
+					if (countersBuffer.length() > maximumFramesize) {
+						LOG.warn("The serialized counter metric is larger than the maximum frame size, " +
+							" so maybe not all metrics would be reported.");
+						markUnserializedMetrics = true;
+						//clear all, because we can not revoke the latest metrics which caused overflow
+						unserializedCounters.put(entry.getKey(), entry.getValue());
+						countersBuffer.clear();
+						numCounters = 0;
+					}
 				} catch (Exception e) {
 					LOG.debug("Failed to serialize counter.", e);
 				}
 			}
 
+			gaugesBuffer.clear();
 			int numGauges = 0;
 			for (Map.Entry<Gauge<?>, Tuple2<QueryScopeInfo, String>> entry : gauges.entrySet())
{
+				if (markUnserializedMetrics) {
+					unserializedGauges.put(entry.getKey(), entry.getValue());
+					continue;
+				}
+
 				try {
-					serializeGauge(buffer, entry.getValue().f0, entry.getValue().f1, entry.getKey());
+					serializeGauge(gaugesBuffer, entry.getValue().f0, entry.getValue().f1, entry.getKey());
 					numGauges++;
+					if (gaugesBuffer.length() + countersBuffer.length() > maximumFramesize) {
+						LOG.warn("The serialized gauge metric is larger than the maximum frame size, " +
+							" so maybe not all metrics would be reported.");
+						markUnserializedMetrics = true;
+						unserializedGauges.put(entry.getKey(), entry.getValue());
+						gaugesBuffer.clear();
+						numGauges = 0;
+					}
 				} catch (Exception e) {
 					LOG.debug("Failed to serialize gauge.", e);
 				}
 			}
 
+			histogramsBuffer.clear();
 			int numHistograms = 0;
 			for (Map.Entry<Histogram, Tuple2<QueryScopeInfo, String>> entry : histograms.entrySet())
{
+				if (markUnserializedMetrics) {
+					unserializedHistograms.put(entry.getKey(), entry.getValue());
+					continue;
+				}
+
 				try {
-					serializeHistogram(buffer, entry.getValue().f0, entry.getValue().f1, entry.getKey());
+					serializeHistogram(histogramsBuffer, entry.getValue().f0, entry.getValue().f1, entry.getKey());
 					numHistograms++;
+					if (histogramsBuffer.length() + gaugesBuffer.length() + countersBuffer.length() >
maximumFramesize) {
+						LOG.warn("The serialized histogram metric is larger than the maximum frame size, "
+
+							" so maybe not all metrics would be reported.");
+						markUnserializedMetrics = true;
+						unserializedHistograms.put(entry.getKey(), entry.getValue());
+						histogramsBuffer.clear();
+						numHistograms = 0;
+					}
 				} catch (Exception e) {
 					LOG.debug("Failed to serialize histogram.", e);
 				}
 			}
 
+			metersBuffer.clear();
 			int numMeters = 0;
 			for (Map.Entry<Meter, Tuple2<QueryScopeInfo, String>> entry : meters.entrySet())
{
+				if (markUnserializedMetrics) {
+					unserializedMeters.put(entry.getKey(), entry.getValue());
+					continue;
+				}
+
 				try {
-					serializeMeter(buffer, entry.getValue().f0, entry.getValue().f1, entry.getKey());
+					serializeMeter(metersBuffer, entry.getValue().f0, entry.getValue().f1, entry.getKey());
 					numMeters++;
+					if (metersBuffer.length() + histogramsBuffer.length() + gaugesBuffer.length() +
 
 Review comment:
   Let's say that our max frameSize is 100 bytes. Let's also say that counters, meters and
gauges take up 1 byte, and histograms take up 98.
   
   I the order that you're serializing metrics (counters -> gauges -> histograms ->
meters), counters, gauges and histograms will be serialized and fit (1 + 1 + 98 <= 100).
Then the meters come along with their additional 1 byte and are now dropped from the report.
   
   Given that the only known case of this problem arising is a ridiculous amount of latency
histograms they should be excluded first, instead of meters. As it stands histograms are the
least important yet most expensive metrics, so let's drop those first instead of rather essential
metrics like `numRecordsInPerSecond`.
   
   Now, this does _not_ mean that you now should change the order again (like you had in a
previous version), but that the MQS should first try to drop histograms to fit the report
into the dump.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Handle oversized metric messges
> -------------------------------
>
>                 Key: FLINK-10252
>                 URL: https://issues.apache.org/jira/browse/FLINK-10252
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Metrics
>    Affects Versions: 1.5.3, 1.6.0, 1.7.0
>            Reporter: Till Rohrmann
>            Assignee: vinoyang
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.5.6, 1.6.3, 1.7.0
>
>
> Since the {{MetricQueryService}} is implemented as an Akka actor, it can only send messages
of a smaller size then the current {{akka.framesize}}. We should check similarly to FLINK-10251
whether the payload exceeds the maximum framesize and fail fast if it is true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message