flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [flink] wuchong commented on a change in pull request #8276: [FLINK-12314] [docs-zh] Translate the "Type Serialization" page into …
Date Sun, 05 May 2019 09:34:06 GMT
wuchong commented on a change in pull request #8276: [FLINK-12314] [docs-zh] Translate the
"Type Serialization" page into …
URL: https://github.com/apache/flink/pull/8276#discussion_r281012311

 File path: docs/dev/types_serialization.zh.md
 @@ -24,180 +24,164 @@ specific language governing permissions and limitations
 under the License.
-Apache Flink handles data types and serialization in a unique way, containing its own type
-generic type extraction, and type serialization framework. This document describes the concepts
and the rationale behind them.
+Apache Flink 以其独特的方式来处理数据类型以及序列化,这种方式包括它自身的类型描述符、泛型类型提取以及类型序列化框架。
 * This will be replaced by the TOC
-## Type handling in Flink
+## Flink 中的类型处理
-Flink tries to infer a lot of information about the data types that are exchanged and stored
during the distributed computation.
-Think about it like a database that infers the schema of tables. In most cases, Flink infers
all necessary information seamlessly
-by itself. Having the type information allows Flink to do some cool things:
+Flink 对分布式计算中发生的数据交换以及排序,试图推断有关数据类型的大量信息。
+可以把它想象成一个推断表结构的数据库。在大多数情况下,Flink 可以依赖自身透明的推断出所有需要的类型信息。
+掌握这些类型信息可以帮助 Flink 实现很多意想不到的特性:
-* Using POJOs types and grouping / joining / aggregating them by referring to field names
(like `dataSet.keyBy("username")`).
-  The type information allows Flink to check (for typos and type compatibility) early rather
than failing later at runtime.
+* 对于使用 POJOs 类型的数据,可以通过指定字段名(比如 `dataSet.keyBy("username")`
)进行 grouping 、joining、aggregating 操作。
+  类型信息可以帮助 Flink 在运行前做一些拼写错误以及类型兼容方面的检查,而不是等到运行时才暴露这些问题。
-* The more Flink knows about data types, the better the serialization and data layout schemes
-  That is quite important for the memory usage paradigm in Flink (work on serialized data
inside/outside the heap where ever possible
-  and make serialization very cheap).
+* Flink 对数据类型了解的越多,序列化和数据布局方案就越好。
+  这对 Flink 中的内存使用范式尤为重要(可以尽可能处理堆上或者堆外的序列化数据并且使序列化操作很廉价)。
-* Finally, it also spares users in the majority of cases from worrying about serialization
frameworks and having to register types.
+* 最后,它还使用户在大多数情况下免于担心序列化框架以及类型注册。
-In general, the information about data types is needed during the *pre-flight phase* - that
is, when the program's calls on `DataStream`
-and `DataSet` are made, and before any call to `execute()`, `print()`, `count()`, or `collect()`.
+通常在应用*运行之前的阶段 (pre-flight phase)*,需要数据的类型信息 -
也就是在程序对 `DataStream` 或者
+`DataSet` 的操作调用之后,在 `execute()`、`print()`、`count()`、`collect()` 调用之前。
-## Most Frequent Issues
+## 最常见问题
-The most frequent issues where users need to interact with Flink's data type handling are:
+用户需要与 Flink 数据类型处理进行交互的最常见问题是:
-* **Registering subtypes:** If the function signatures describe only the supertypes, but
they actually use subtypes of those during execution,
-  it may increase performance a lot to make Flink aware of these subtypes.
-  For that, call `.registerType(clazz)` on the `StreamExecutionEnvironment` or `ExecutionEnvironment`
for each subtype.
+* **注册子类型** 如果函数签只包含超类型,但它们实际上在执行期间使用那些类型的子类型,则使
Flink 感知这些子类型可能会大大提高性能。
 Review comment:
   * **注册子类型** 如果函数签名只包含超类型,但它们实际上在执行期间使用那些类型的子类型,则使
Flink 感知这些子类型可能会大大提高性能。

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

View raw message