flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [flink] wuchong commented on a change in pull request #8276: [FLINK-12314] [docs-zh] Translate the "Type Serialization" page into …
Date Sun, 05 May 2019 09:34:06 GMT
wuchong commented on a change in pull request #8276: [FLINK-12314] [docs-zh] Translate the
"Type Serialization" page into …
URL: https://github.com/apache/flink/pull/8276#discussion_r281012375
 
 

 ##########
 File path: docs/dev/types_serialization.zh.md
 ##########
 @@ -24,180 +24,164 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Apache Flink handles data types and serialization in a unique way, containing its own type
descriptors,
-generic type extraction, and type serialization framework. This document describes the concepts
and the rationale behind them.
+Apache Flink 以其独特的方式来处理数据类型以及序列化,这种方式包括它自身的类型描述符、泛型类型提取以及类型序列化框架。
+本文档描述了它们背后的概念和基本原理。
 
 * This will be replaced by the TOC
 {:toc}
 
 
-## Type handling in Flink
+## Flink 中的类型处理
 
-Flink tries to infer a lot of information about the data types that are exchanged and stored
during the distributed computation.
-Think about it like a database that infers the schema of tables. In most cases, Flink infers
all necessary information seamlessly
-by itself. Having the type information allows Flink to do some cool things:
+Flink 对分布式计算中发生的数据交换以及排序,试图推断有关数据类型的大量信息。
+可以把它想象成一个推断表结构的数据库。在大多数情况下,Flink 可以依赖自身透明的推断出所有需要的类型信息。
+掌握这些类型信息可以帮助 Flink 实现很多意想不到的特性:
 
-* Using POJOs types and grouping / joining / aggregating them by referring to field names
(like `dataSet.keyBy("username")`).
-  The type information allows Flink to check (for typos and type compatibility) early rather
than failing later at runtime.
+* 对于使用 POJOs 类型的数据,可以通过指定字段名(比如 `dataSet.keyBy("username")`
)进行 grouping 、joining、aggregating 操作。
+  类型信息可以帮助 Flink 在运行前做一些拼写错误以及类型兼容方面的检查,而不是等到运行时才暴露这些问题。
 
-* The more Flink knows about data types, the better the serialization and data layout schemes
are.
-  That is quite important for the memory usage paradigm in Flink (work on serialized data
inside/outside the heap where ever possible
-  and make serialization very cheap).
+* Flink 对数据类型了解的越多,序列化和数据布局方案就越好。
+  这对 Flink 中的内存使用范式尤为重要(可以尽可能处理堆上或者堆外的序列化数据并且使序列化操作很廉价)。
 
-* Finally, it also spares users in the majority of cases from worrying about serialization
frameworks and having to register types.
+* 最后,它还使用户在大多数情况下免于担心序列化框架以及类型注册。
 
-In general, the information about data types is needed during the *pre-flight phase* - that
is, when the program's calls on `DataStream`
-and `DataSet` are made, and before any call to `execute()`, `print()`, `count()`, or `collect()`.
+通常在应用*运行之前的阶段 (pre-flight phase)*,需要数据的类型信息 -
也就是在程序对 `DataStream` 或者
+`DataSet` 的操作调用之后,在 `execute()`、`print()`、`count()`、`collect()` 调用之前。
 
 
-## Most Frequent Issues
+## 最常见问题
 
-The most frequent issues where users need to interact with Flink's data type handling are:
+用户需要与 Flink 数据类型处理进行交互的最常见问题是:
 
-* **Registering subtypes:** If the function signatures describe only the supertypes, but
they actually use subtypes of those during execution,
-  it may increase performance a lot to make Flink aware of these subtypes.
-  For that, call `.registerType(clazz)` on the `StreamExecutionEnvironment` or `ExecutionEnvironment`
for each subtype.
+* **注册子类型** 如果函数签只包含超类型,但它们实际上在执行期间使用那些类型的子类型,则使
Flink 感知这些子类型可能会大大提高性能。
+  可以为每一个子类型调用 `StreamExecutionEnvironment` 或者 `ExecutionEnvironment`
的 `.registerType(clazz)` 方法。
 
-* **Registering custom serializers:** Flink falls back to [Kryo](https://github.com/EsotericSoftware/kryo)
for the types that it does not handle transparently
-  by itself. Not all types are seamlessly handled by Kryo (and thus by Flink). For example,
many Google Guava collection types do not work well
-  by default. The solution is to register additional serializers for the types that cause
problems.
-  Call `.getConfig().addDefaultKryoSerializer(clazz, serializer)` on the `StreamExecutionEnvironment`
or `ExecutionEnvironment`.
-  Additional Kryo serializers are available in many libraries. See [Custom Serializers]({{
site.baseurl }}/dev/custom_serializers.html) for more details on working with custom serializers.
+* **注册自定义序列化器:** 当 Flink 无法通过自身处理类型时会回退到
[Kryo](https://github.com/EsotericSoftware/kryo) 进行处理。
+  并非所有的类型都可以被 Kryo (或者 Flink ) 处理。例如谷歌的 Guava 集合类型默认情况下是没办法很好处理的。
+  解决方案是为这些引起问题的类型注册额外的序列化器。调用 `StreamExecutionEnvironment`
或者 `ExecutionEnvironment` 
+  的 `.getConfig().addDefaultKryoSerializer(clazz, serializer)` 方法注册 Kryo 序列化器。存在很多的额外
Kryo 序列化器类库
+  具体细节可以参看 [自定义序列化器]({{ site.baseurl }}/zh/dev/custom_serializers.html)
以了解更多的自定义序列化器。
 
-* **Adding Type Hints:** Sometimes, when Flink cannot infer the generic types despite all
tricks, a user must pass a *type hint*. That is generally
-  only necessary in the Java API. The [Type Hints Section](#type-hints-in-the-java-api) describes
that in more detail.
+* **添加类型提示** 有时, Flink 用尽一切手段也无法推断出泛型类型,用户需要提供*类型提示*。通常只在
Java API 中需要。
+  [类型提示部分](#java-api-中的类型提示) 描述了更多的细节。
 
-* **Manually creating a `TypeInformation`:** This may be necessary for some API calls where
it is not possible for Flink to infer
-  the data types due to Java's generic type erasure. See [Creating a TypeInformation or TypeSerializer](#creating-a-typeinformation-or-typeserializer)
-  for details.
+* **手动创建 `TypeInformation`:** 这可能是某些 API 调用所必需的,因为
Java 的泛型类型擦除会导致 Flink 无法推断出数据类型。
+  参考 [创建 TypeInformation 或者 TypeSerializer](#创建-typeinformation-或者-typeserializer)
 
+## Flink 的 TypeInformation 类
 
-## Flink's TypeInformation class
+类 {% gh_link /flink-core/src/main/java/org/apache/flink/api/common/typeinfo/TypeInformation.java
"TypeInformation" %}
+是所有类型描述符的基类。该类表示类型的基本属性,并且可以生成序列化器,在一些特殊情况下可以生产类型的比较器。
+(*请注意,Flink 中的比较器不仅仅是定义顺序 - 它们是处理键的的基础工具*)
 
 Review comment:
   ```suggestion
   (*请注意,Flink 中的比较器不仅仅是定义顺序 - 它们是处理键的基础工具*)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message