flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Apache Flink Operator State as Query Cache
Date Wed, 11 Nov 2015 09:49:23 GMT

In general, if you can keep state in Flink, you get better
throughput/latency/consistency and have one less system to worry about
(external k/v store). State outside means that the Flink processes can be
slimmer and need fewer resources and as such recover a bit faster. There
are use cases for that as well.

Storing the model in OperatorState is a good idea, if you can. On the
roadmap is to migrate the operator state to managed memory as well, so that
should take care of the GC issues.

We are just adding functionality to make the Key/Value operator state
usable in CoMap/CoFlatMap as well (currently it only works in windows and
in Map/FlatMap/Filter functions over the KeyedStream).
Until the, you should be able to use a simple Java HashMap and use the
"Checkpointed" interface to get it persistent.


On Sun, Nov 8, 2015 at 10:11 AM, Welly Tambunan <if05041@gmail.com> wrote:

> Thanks for the answer.
> Currently the approach that i'm using right now is creating a base/marker
> interface to stream different type of message to the same operator. Not
> sure about the performance hit about this compare to the CoFlatMap
> function.
> Basically this one is providing query cache, so i'm thinking instead of
> using in memory cache like redis, ignite etc, i can just use operator state
> for this one.
> I just want to gauge do i need to use memory cache or operator state would
> be just fine.
> However i'm concern about the Gen 2 Garbage Collection for caching our own
> state without using operator state. Is there any clarification on that one
> ?
> On Sat, Nov 7, 2015 at 12:38 AM, Anwar Rizal <anrizal05@gmail.com> wrote:
>> Let me understand your case better here. You have a stream of model and
>> stream of data. To process the data, you will need a way to access your
>> model from the subsequent stream operations (map, filter, flatmap, ..).
>> I'm not sure in which case Operator State is a good choice, but I think
>> you can also live without.
>> val modelStream = .... // get the model stream
>> val dataStream   =
>> modelStream.broadcast.connect(dataStream). coFlatMap(  ) Then you can
>> keep the latest model in a CoFlatMapRichFunction, not necessarily as
>> Operator State, although maybe OperatorState is a good choice too.
>> Does it make sense to you ?
>> Anwar
>> On Fri, Nov 6, 2015 at 10:21 AM, Welly Tambunan <if05041@gmail.com>
>> wrote:
>>> Hi All,
>>> We have a high density data that required a downsample. However this
>>> downsample model is very flexible based on the client device and user
>>> interaction. So it will be wasteful to precompute and store to db.
>>> So we want to use Apache Flink to do downsampling and cache the result
>>> for subsequent query.
>>> We are considering using Flink Operator state for that one.
>>> Is that the right approach to use that for memory cache ? Or if that
>>> preferable using memory cache like redis etc.
>>> Any comments will be appreciated.
>>> Cheers
>>> --
>>> Welly Tambunan
>>> Triplelands
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
> --
> Welly Tambunan
> Triplelands
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>

View raw message