spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
Date Tue, 11 Dec 2018 12:23:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16717001#comment-16717001
] 

ASF GitHub Bot commented on SPARK-26193:
----------------------------------------

xuanyuanking commented on a change in pull request #23207: [SPARK-26193][SQL] Implement shuffle
write metrics in SQL
URL: https://github.com/apache/spark/pull/23207#discussion_r240587659
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/shuffle/ShuffleWriteProcessor.scala
 ##########
 @@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.shuffle
+
+import org.apache.spark.{Partition, ShuffleDependency, SparkEnv, TaskContext}
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.scheduler.MapStatus
+
+/**
+ * The interface for customizing shuffle write process. The driver create a ShuffleWriteProcessor
+ * and put it into [[ShuffleDependency]], and executors use it in each ShuffleMapTask.
+ */
+private[spark] class ShuffleWriteProcessor extends Serializable with Logging {
+
+  /**
+   * Create a [[ShuffleWriteMetricsReporter]] from the task context. As the reporter is a
+   * per-row operator, here need a careful consideration on performance.
+   */
+  protected def createMetricsReporter(context: TaskContext): ShuffleWriteMetricsReporter
= {
+    context.taskMetrics().shuffleWriteMetrics
+  }
+
+  /**
+   * The write process for particular partition, it controls the life circle of [[ShuffleWriter]]
+   * get from [[ShuffleManager]] and triggers rdd compute, finally return the [[MapStatus]]
for
+   * this task.
+   */
+  def writeProcess(
 
 Review comment:
   Copy, will change this to `write`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Implement shuffle write metrics in SQL
> --------------------------------------
>
>                 Key: SPARK-26193
>                 URL: https://issues.apache.org/jira/browse/SPARK-26193
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Xiao Li
>            Assignee: Yuanjian Li
>            Priority: Major
>             Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message