flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [flink] wuchong commented on a change in pull request #8341: [FLINK-11633][docs-zh] Translate "Working with state" into Chinese
Date Sun, 05 May 2019 08:26:18 GMT
wuchong commented on a change in pull request #8341: [FLINK-11633][docs-zh] Translate "Working
with state" into Chinese
URL: https://github.com/apache/flink/pull/8341#discussion_r281008571
 
 

 ##########
 File path: docs/dev/stream/state/state.zh.md
 ##########
 @@ -22,122 +22,87 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-This document explains how to use Flink's state abstractions when developing an application.
-
-* ToC
+本文档主要介绍如何在 Flink 作业中使用状态
+* 目录
 {:toc}
 
 ## Keyed State and Operator State
 
-There are two basic kinds of state in Flink: `Keyed State` and `Operator State`.
+Flink 中有两种基本的状态:`Keyed State` 和 `Operator State`
 
 ### Keyed State
 
-*Keyed State* is always relative to keys and can only be used in functions and operators
on a `KeyedStream`.
+*Keyed State* 通常和键相关,仅可使用在 `KeyedStream` 的方法和算子中。
 
-You can think of Keyed State as Operator State that has been partitioned,
-or sharded, with exactly one state-partition per key.
-Each keyed-state is logically bound to a unique
-composite of <parallel-operator-instance, key>, and since each key
-"belongs" to exactly one parallel instance of a keyed operator, we can
-think of this simply as <operator, key>.
+你可以把 Keyed State 看作分区或者共享的 Operator State, 而且每个键仅出现在一个分区内。
+逻辑上每个 keyed-state 和 <算子并发实例, key> 相绑定,由于每个
key 仅"属于"
+算子的一个并发,因此简化为 <算子, key>。
 
-Keyed State is further organized into so-called *Key Groups*. Key Groups are the
-atomic unit by which Flink can redistribute Keyed State;
-there are exactly as many Key Groups as the defined maximum parallelism.
-During execution each parallel instance of a keyed operator works with the keys
-for one or more Key Groups.
+Keyed State 会按照 *Key Group* 进行管理。Key Group 是 Flink 分发 Keyed State 的最小单元;
+Key Groups 的数目等于作业的最大并发数。在执行过程中,每个 keyed operator
会对应到一个或多个 Key Group
 
 ### Operator State
 
-With *Operator State* (or *non-keyed state*), each operator state is
-bound to one parallel operator instance.
-The [Kafka Connector]({{ site.baseurl }}/dev/connectors/kafka.html) is a good motivating
example for the use of Operator State
-in Flink. Each parallel instance of the Kafka consumer maintains a map
-of topic partitions and offsets as its Operator State.
+对于 *Operator State* (或者 *non-keyed state*) 来说,每个 operator state 和一个并发实例进行绑定。
+[Kafka Connector]({{ site.baseurl }}/zh/dev/connectors/kafka.html) 是 Flink 中使用 operator
state 的一个很好的示例。
+每个 Kafka 消费者的并发在 Operator State 中维护一个 topic partition 到 offset
的映射关系。
 
-The Operator State interfaces support redistributing state among
-parallel operator instances when the parallelism is changed. There can be different schemes
for doing this redistribution.
+Operator State 在 Flink 作业的并发改变后,会重新分发状态,分发的策略和
Keyed State 不一样。
 
 ## Raw and Managed State
 
-*Keyed State* and *Operator State* exist in two forms: *managed* and *raw*.
+*Keyed State* 和 *Operator State* 分别有两种存在形式:*managed* and *raw*.
 
-*Managed State* is represented in data structures controlled by the Flink runtime, such as
internal hash tables, or RocksDB.
-Examples are "ValueState", "ListState", etc. Flink's runtime encodes
-the states and writes them into the checkpoints.
+*Managed State* 有 Flink runtime 中的数据结构所控制,比如内部的 hash table
或者 RocksDB。
+比如 "ValueState", "ListState" 等。Flink runtime 会对这些状态进行编码并写入
checkpoint。
 
-*Raw State* is state that operators keep in their own data structures. When checkpointed,
they only write a sequence of bytes into
-the checkpoint. Flink knows nothing about the state's data structures and sees only the raw
bytes.
+*Raw State* 则保存在算子自己的数据结构中。checkpoint 的时候,Flink 并不知晓具体的内容,仅仅写入一串字节序列到
checkpoint。
 
-All datastream functions can use managed state, but the raw state interfaces can only be
used when implementing operators.
-Using managed state (rather than raw state) is recommended, since with
-managed state Flink is able to automatically redistribute state when the parallelism is
-changed, and also do better memory management.
+所有 datastream 的方法都可以使用 managed state, 但是 raw state 则只能在实现算子的时候使用。
+由于 Flink 可以在修改并发是更好的重新分发状态数据,并且能够更好的管理内存,因此建议使用
managed state(而不是 raw state)。
 
-<span class="label label-danger">Attention</span> If your managed state needs
custom serialization logic, please see 
-the [corresponding guide](custom_serialization.html) in order to ensure future compatibility.
Flink's default serializers 
-don't need special treatment.
+<span class="label label-danger">注意</span> 如果你的 managed state 需要定制化的序列化逻辑,
+为了后续的兼容性请参考 [corresponding guide](custom_serialization.html),Flink
默认提供的序列化器不需要用户做特殊的处理。
 
 ## Using Managed Keyed State
 
-The managed keyed state interface provides access to different types of state that are all
scoped to
-the key of the current input element. This means that this type of state can only be used
-on a `KeyedStream`, which can be created via `stream.keyBy(…)`.
+managed keyed state 接口提供不同类型状态访问的接口,这些状态都限定于当前的输入数据。换句话说,这些状态仅可在
`KeyedStream`
+上使用,可以通过 `stream.keyBy(...)` 得到 `KeyedStream`.
+
+接下来,我们会介绍不同类型的状态,然后介绍如何使用他们。所有支持的状态类型如下所示:
 
-Now, we will first look at the different types of state available and then we will see
-how they can be used in a program. The available state primitives are:
+* `ValueState<T>`: 保存一个可以更新和检索的值(像上面所述,每个值都对应到当前的输入数据,因此可能每个算子接收到的键都可能对应一个值)。
+这个值可以通过 `update(T)` 进行更新,通过 `T value()` 进行检索。
 
-* `ValueState<T>`: This keeps a value that can be updated and
-retrieved (scoped to key of the input element as mentioned above, so there will possibly
be one value
-for each key that the operation sees). The value can be set using `update(T)` and retrieved
using
-`T value()`.
 
-* `ListState<T>`: This keeps a list of elements. You can append elements and retrieve
an `Iterable`
-over all currently stored elements. Elements are added using `add(T)` or `addAll(List<T>)`,
the Iterable can
-be retrieved using `Iterable<T> get()`. You can also override the existing list with
`update(List<T>)`
+* `ListState<T>`: 保存一个元素的列表。可以往这个列表中追加数据,并在当前的列表上进行检索。可以通过
+ `add(T)` 或者 `addAll(List<T>)` 进行添加元素,通过 `Iterable<T> get()`
获得整个列表。还可以通过 `update(List<T>)` 覆盖当前的列表。
 
-* `ReducingState<T>`: This keeps a single value that represents the aggregation of
all values
-added to the state. The interface is similar to `ListState` but elements added using
-`add(T)` are reduced to an aggregate using a specified `ReduceFunction`.
+* `ReducingState<T>`: 保存一个单值,表示添加到状态的所有值的聚合。接口与
`ListState` 类似,但使用 `add(T)` 增加元素,会使用提供的 `ReduceFunction`
进行聚合。
 
-* `AggregatingState<IN, OUT>`: This keeps a single value that represents the aggregation
of all values
-added to the state. Contrary to `ReducingState`, the aggregate type may be different from
the type
-of elements that are added to the state. The interface is the same as for `ListState` but
elements
-added using `add(IN)` are aggregated using a specified `AggregateFunction`.
+* `AggregatingState<IN, OUT>`: 保留一个单值,表示添加到状态的所有值的聚合。和
`ReducingState` 相反的是, 聚合类型可能与 添加到状态的元素的类型不同。
+接口与 `ListState` 类似,但使用 `add(IN)` 添加的元素使用指定的 `AggregateFunction`
聚合。
 
-* `FoldingState<T, ACC>`: This keeps a single value that represents the aggregation
of all values
-added to the state. Contrary to `ReducingState`, the aggregate type may be different from
the type
-of elements that are added to the state. The interface is similar to `ListState` but elements
-added using `add(T)` are folded into an aggregate using a specified `FoldFunction`.
+* `FoldingState<T, ACC>`: 保留一个单值,表示添加到状态的所有值的聚合。
与 `ReducingState` 相反,聚合类型可能与添加到状态的元素类型不同。

+接口与 `ListState` 类似,但使用`add(T)`添加的元素使用指定的 `FoldFunction`
折叠成聚合。
 
 Review comment:
   ```suggestion
   接口与 `ListState` 类似,但使用`add(T)`添加的元素会用指定的 `FoldFunction`
折叠成聚合值。
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message