In general, if you can keep state in Flink, you get better throughput/latency/consistency and have one less system to worry about (external k/v store). State outside means that the Flink processes can be slimmer and need fewer resources and as such recover a bit faster. There are use cases for that as well.

Storing the model in OperatorState is a good idea, if you can. On the roadmap is to migrate the operator state to managed memory as well, so that should take care of the GC issues.

We are just adding functionality to make the Key/Value operator state usable in CoMap/CoFlatMap as well (currently it only works in windows and in Map/FlatMap/Filter functions over the KeyedStream).
Until the, you should be able to use a simple Java HashMap and use the "Checkpointed" interface to get it persistent.


On Sun, Nov 8, 2015 at 10:11 AM, Welly Tambunan <if05041@gmail.com> wrote:
Thanks for the answer. 

Currently the approach that i'm using right now is creating a base/marker interface to stream different type of message to the same operator. Not sure about the performance hit about this compare to the CoFlatMap function. 

Basically this one is providing query cache, so i'm thinking instead of using in memory cache like redis, ignite etc, i can just use operator state for this one. 

I just want to gauge do i need to use memory cache or operator state would be just fine. 

However i'm concern about the Gen 2 Garbage Collection for caching our own state without using operator state. Is there any clarification on that one ? 

On Sat, Nov 7, 2015 at 12:38 AM, Anwar Rizal <anrizal05@gmail.com> wrote:

Let me understand your case better here. You have a stream of model and stream of data. To process the data, you will need a way to access your model from the subsequent stream operations (map, filter, flatmap, ..).
I'm not sure in which case Operator State is a good choice, but I think you can also live without.

val modelStream = .... // get the model stream
val dataStream   = 

modelStream.broadcast.connect(dataStream). coFlatMap(  ) Then you can keep the latest model in a CoFlatMapRichFunction, not necessarily as Operator State, although maybe OperatorState is a good choice too.

Does it make sense to you ?


On Fri, Nov 6, 2015 at 10:21 AM, Welly Tambunan <if05041@gmail.com> wrote:
Hi All, 

We have a high density data that required a downsample. However this downsample model is very flexible based on the client device and user interaction. So it will be wasteful to precompute and store to db. 

So we want to use Apache Flink to do downsampling and cache the result for subsequent query. 

We are considering using Flink Operator state for that one. 

Is that the right approach to use that for memory cache ? Or if that preferable using memory cache like redis etc. 

Any comments will be appreciated.