flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-10977) Add FlatAggregate operator to unbounded streaming Table API
Date Wed, 12 Dec 2018 11:39:02 GMT

    [ https://issues.apache.org/jira/browse/FLINK-10977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718842#comment-16718842
] 

ASF GitHub Bot commented on FLINK-10977:
----------------------------------------

dianfu commented on a change in pull request #7209: [FLINK-10977][table] Add UnBounded FlatAggregate
operator to streaming Table API
URL: https://github.com/apache/flink/pull/7209#discussion_r240955939
 
 

 ##########
 File path: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/TableAggregationCodeGenerator.scala
 ##########
 @@ -0,0 +1,287 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.codegen
+
+import org.apache.calcite.rex.RexLiteral
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.table.api.TableConfig
+import org.apache.flink.table.api.dataview._
+import org.apache.flink.table.codegen.Indenter.toISC
+import org.apache.flink.table.functions.UserDefinedAggregateFunction
+import org.apache.flink.table.runtime.aggregate.GeneratedTableAggregations
+import org.apache.flink.table.runtime.types.CRow
+import org.apache.flink.table.utils.RetractableCollector
+import org.apache.flink.types.Row
+import org.apache.flink.util.Collector
+import org.apache.flink.table.codegen.TableAggregationCodeGenerator._
+import org.apache.flink.table.plan.schema.RowSchema
+
+/**
+  * A base code generator for generating [[GeneratedTableAggregations]].
+  *
+  * @param config                 configuration that determines runtime behavior
+  * @param nullableInput          input(s) can be null.
+  * @param input                  type information about the input of the Function
+  * @param constants              constant expressions that act like a second input in the
+  *                               parameter indices.
+  * @param name                   Class name of the function.
+  *                               Does not need to be unique but has to be a valid Java class
+  *                               identifier.
+  * @param physicalInputTypes     Physical input row types
+  * @param tableAggOutputTypes    Output types of TableAggregateFunction
+  * @param outputSchema           The type of the rows emitted by TableAggregate operator
+  * @param aggregates             All aggregate functions
+  * @param aggFields              Indexes of the input fields for all aggregate functions
+  * @param aggMapping             The mapping of aggregates to output fields
+  * @param isDistinctAggs         The flag array indicating whether it is distinct aggregate.
+  * @param isStateBackedDataViews a flag to indicate if distinct filter uses state backend.
+  * @param partialResults         A flag defining whether final or partial results (accumulators)
+  *                               are set
+  *                               to the output row.
+  * @param fwdMapping             The mapping of input fields to output fields
+  * @param mergeMapping           An optional mapping to specify the accumulators to merge.
If not
+  *                               set, we
+  *                               assume that both rows have the accumulators at the same
position.
+  * @param outputArity            The number of fields in the output row.
+  * @param needRetract            a flag to indicate if the aggregate needs the retract method
+  * @param needEmitWithRetract    a flag to indicate if the aggregate needs to output retractions
+  *                               when update
+  * @param needMerge              a flag to indicate if the aggregate needs the merge method
+  * @param needReset              a flag to indicate if the aggregate needs the resetAccumulator
+  *                               method
+  * @param accConfig              Data view specification for accumulators
+  */
+class TableAggregationCodeGenerator(
+    config: TableConfig,
+    nullableInput: Boolean,
+    input: TypeInformation[_ <: Any],
+    constants: Option[Seq[RexLiteral]],
+    name: String,
+    physicalInputTypes: Seq[TypeInformation[_]],
+    tableAggOutputTypes: TypeInformation[_],
+    outputSchema: RowSchema,
+    aggregates: Array[UserDefinedAggregateFunction[_ <: Any, _ <: Any]],
+    aggFields: Array[Array[Int]],
+    aggMapping: Array[Int],
+    isDistinctAggs: Array[Boolean],
+    isStateBackedDataViews: Boolean,
+    partialResults: Boolean,
+    fwdMapping: Array[Int],
+    mergeMapping: Option[Array[Int]],
+    outputArity: Int,
+    needRetract: Boolean,
+    needEmitWithRetract: Boolean,
+    needMerge: Boolean,
+    needReset: Boolean,
+    accConfig: Option[Array[Seq[DataViewSpec[_]]]])
+  extends AggregationBaseCodeGenerator(
+    config,
+    nullableInput,
+    input,
+    constants,
+    name,
+    physicalInputTypes,
+    aggregates,
+    aggFields,
+    aggMapping,
+    isDistinctAggs,
+    isStateBackedDataViews,
+    partialResults,
+    fwdMapping,
+    mergeMapping,
+    outputArity,
+    needRetract,
+    needMerge,
+    needReset,
+    accConfig) {
+
+
+  def genEmit: String = {
+
+    val sig: String =
+      j"""
+         |  public final void emit(
+         |    $ROW accs,
+         |    $COLLECTOR<$CROW> collector) throws Exception """.stripMargin
+
+    val emitMethodName = if (needEmitWithRetract) "emitValueWithRetract" else "emitValue"
+    val emit: String = {
+      for (i <- aggs.indices) yield {
+        val emitAcc =
+          j"""
+             |      ${genAccDataViewFieldSetter(s"acc$i", i)}
+             |      ${aggs(i)}.$emitMethodName(acc$i
+             |        ${if (!parametersCode(i).isEmpty) "," else ""}
+             |        ${CONVERT_COLLECTOR_VARIABLE_TERM});
+               """.stripMargin
+        j"""
+           |    ${accTypes(i)} acc$i = (${accTypes(i)}) accs.getField($i);
+           |    $CONVERT_COLLECTOR_VARIABLE_TERM.$COLLECTOR_VARIABLE_TERM = collector;
+           |    $emitAcc
+               """.stripMargin
+      }
+    }.mkString("\n")
+
+    j"""$sig {
+       |$emit
+       |  }""".stripMargin
+  }
+
+  def genRecordToRow: String = {
+    // gen access expr
+
+    val functionGenerator = new FunctionCodeGenerator(
+      config,
+      false,
+      tableAggOutputTypes,
+      None,
+      None,
+      None)
+
+    functionGenerator.outRecordTerm = s"$CONVERTER_ROW_RESULT_TERM"
+    val inputAccessExprs = functionGenerator.generateFieldAccessExprs
+
+    // gen result expr
+    val conversion = functionGenerator.generateResultExpression(
+      inputAccessExprs,
+      outputSchema.typeInfo,
+      outputSchema.fieldNames)
+    conversion.code
+  }
+
+  /**
+    * Generates a [[org.apache.flink.table.runtime.aggregate.GeneratedAggregations]] that
can be
+    * passed to a Java compiler.
+    *
+    * @param name                   Class name of the function.
+    *                               Does not need to be unique but has to be a valid Java
class
+    *                               identifier.
+    * @param physicalInputTypes     Physical input row types
+    * @param aggregates             All aggregate functions
+    * @param aggFields              Indexes of the input fields for all aggregate functions
+    * @param aggMapping             The mapping of aggregates to output fields
+    * @param isDistinctAggs         The flag array indicating whether it is distinct aggregate.
+    * @param isStateBackedDataViews a flag to indicate if distinct filter uses state backend.
+    * @param partialResults         A flag defining whether final or partial results (accumulators)
+    *                               are set
+    *                               to the output row.
+    * @param fwdMapping             The mapping of input fields to output fields
+    * @param mergeMapping           An optional mapping to specify the accumulators to merge.
If
+    *                               not set, we
+    *                               assume that both rows have the accumulators at the same
+    *                               position.
+    * @param outputArity            The number of fields in the output row.
+    * @param needRetract            a flag to indicate if the aggregate needs the retract
method
+    * @param needMerge              a flag to indicate if the aggregate needs the merge method
+    * @param needReset              a flag to indicate if the aggregate needs the resetAccumulator
+    *                               method
+    * @param accConfig              Data view specification for accumulators
+    * @return A GeneratedAggregationsFunction
+    */
+  def generateTableAggregations: GeneratedTableAggregationsFunction = {
+
+    init()
+    val aggFuncCode = Seq(
+      genAccumulate,
+      genRetract,
+      genCreateAccumulators,
+      genMergeAccumulatorsPair,
+      genEmit).mkString("\n")
+
+    val generatedAggregationsClass = classOf[GeneratedTableAggregations].getCanonicalName
+    val aggOutputTypeName = tableAggOutputTypes.getTypeClass.getCanonicalName
+    val funcCode =
+      j"""
+         |public final class $funcName extends $generatedAggregationsClass {
+         |
+         |  private $CONVERT_COLLECTOR_CLASS_TERM $CONVERT_COLLECTOR_VARIABLE_TERM;
+         |  ${reuseMemberCode()}
+         |  $genMergeList
+         |  public $funcName() throws Exception {
+         |    ${reuseInitCode()}
+         |    $CONVERT_COLLECTOR_VARIABLE_TERM = new $CONVERT_COLLECTOR_CLASS_TERM();
+         |  }
+         |  ${reuseConstructorCode(funcName)}
+         |
+         |  public final void open(
+         |    org.apache.flink.api.common.functions.RuntimeContext $contextTerm) throws Exception
{
+         |    ${reuseOpenCode()}
+         |  }
+         |
+         |  $aggFuncCode
+         |
+         |  public final void cleanup() throws Exception {
+         |    ${reuseCleanupCode()}
+         |  }
+         |
+         |  public final void close() throws Exception {
+         |    ${reuseCloseCode()}
+         |  }
+         |
+         |  public class $CONVERT_COLLECTOR_CLASS_TERM extends $RETRACTABLE_COLLECTOR {
+         |
+         |      public $COLLECTOR<$CROW> $COLLECTOR_VARIABLE_TERM;
+         |      public $CROW $CONVERTER_CROW_RESULT_TERM = new $CROW();
 
 Review comment:
   the row and crow can be private

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Add FlatAggregate operator to unbounded streaming Table API
> -----------------------------------------------------------
>
>                 Key: FLINK-10977
>                 URL: https://issues.apache.org/jira/browse/FLINK-10977
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API &amp; SQL
>            Reporter: sunjincheng
>            Assignee: Hequn Cheng
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.8.0
>
>
> Add FlatAggregate operator to streaming Table API as described in [Google doc|https://docs.google.com/document/d/1tnpxg31EQz2-MEzSotwFzqatsB4rNLz0I-l_vPa5H4Q/edit#heading=h.q23rny2iglsr].
> The usage:
> {code:java}
> val res = tab
> .groupBy('a) // leave out groupBy-clause to define global table aggregates
> .flatAgg(fun: TableAggregateFunction) // output has columns 'a, 'b, 'c
> .select('a, 'c){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message