kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Fouché <nfou...@onfocus.io>
Subject Re: Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)
Date Mon, 16 Jan 2017 22:52:33 GMT
In the case of KAFKA-4468, it's more about state stores. But still, keys
would not be backward compatible. What is the "official" policy about this
kind of change ?

2017-01-16 23:47 GMT+01:00 Nicolas Fouché <nfouche@onfocus.io>:

> Hi Eno,
> I thought it would be impossible to put this in Kafka because of backward
> incompatibility with the existing windowed keys, no ?
> In my case, I had to recreate a new output topic, reset the topology, and
> and reprocess all my data.
>
> 2017-01-16 23:05 GMT+01:00 Eno Thereska <eno.thereska@gmail.com>:
>
>> Nicolas,
>>
>> I'm checking with Bill who originally was interested in KAFKA-4468. If he
>> isn't actively working on it, why don't you give it a go and create a pull
>> request (PR) for it? That way your contribution is properly acknowledged
>> etc. We can help you through with that.
>>
>> Thanks
>> Eno
>> > On 16 Jan 2017, at 18:46, Nicolas Fouché <nfouche@onfocus.io> wrote:
>> >
>> > My current implementation:
>> > https://gist.github.com/nfo/eaf350afb5667a3516593da4d48e757a . I just
>> > appended the window `end` at the end of the byte array.
>> > Comments and suggestions are welcome !
>> >
>> >
>> > 2017-01-16 15:48 GMT+01:00 Nicolas Fouché <nfouche@onfocus.io>:
>> >
>> >> Hi Damian,
>> >>
>> >> I recall now that I copied the `WindowedSerde` class [1] from Confluent
>> >> examples by Confluent, which uses the internal `WindowedSerializer`
>> class.
>> >> Better write my own Serde them. You're right, I should not rely on
>> >> internal classes, especially for data written outside Kafka Streams
>> >> topologies.
>> >>
>> >> Thanks for the insights on KAFKA-4468.
>> >>
>> >> https://github.com/confluentinc/examples/blob/
>> >> 89db45c6890cf757b8e18565bdf7bc23f119a2ff/kafka-streams/src/
>> >> main/java/io/confluent/examples/streams/utils/WindowedSerde.java
>> >>
>> >> Nicolas.
>> >>
>> >> 2017-01-16 12:31 GMT+01:00 Damian Guy <damian.guy@gmail.com>:
>> >>
>> >>> Hi Nicolas,
>> >>>
>> >>> I guess you are using the Processor API for your topology? The
>> >>> WindowedSerializer is an internal class that is used as part of the
>> DSL.
>> >>> In
>> >>> the DSL a topic will be created for each window operation, so we don't
>> >>> need
>> >>> the end time as it can be calculated from the window size.
>> >>> However, there is an open jira for this:
>> >>> https://issues.apache.org/jira/browse/KAFKA-4468
>> >>>
>> >>> Thanks,
>> >>> Damian
>> >>>
>> >>> On Mon, 16 Jan 2017 at 11:18 Nicolas Fouché <nfouche@onfocus.io>
>> wrote:
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> In the same topology, I generate aggregates with 1-day windows and
>> >>> 1-week
>> >>>> windows and write them in one single topic. On Mondays, these windows
>> >>> have
>> >>>> the same start time. The effect: these aggregates overrides each
>> other.
>> >>>>
>> >>>> That happens because WindowedSerializer [1] only serializes the
>> window
>> >>>> start time. I'm a bit surprised, a window has by definition a start
>> and
>> >>> an
>> >>>> end. I suppose one wanted save on key sizes ? And/or one would
>> consider
>> >>>> that topics should not contain aggregates with different
>> granularities ?
>> >>>>
>> >>>> I have two choices then, either create as many output topics as
I
>> have
>> >>>> granularities, or create my own serializer which also includes the
>> >>> window
>> >>>> end time. What would the community recommend ?
>> >>>>
>> >>>> Getting back to the core problem:
>> >>>> I could understand that it's not "right" to store different
>> >>> granularities
>> >>>> in one topic, and I thought it would save resources (less topic
to
>> >>> manage
>> >>>> by Kafka). But, I'm really not sure about this default serializer:
it
>> >>> does
>> >>>> not serialize all instance variables of the `Window` class, and
more
>> >>>> generally does comply to the definition of a window.
>> >>>>
>> >>>> [1]
>> >>>>
>> >>>> https://github.com/apache/kafka/blob/0.10.1/streams/src/main
>> >>> /java/org/apache/kafka/streams/kstream/internals/WindowedSer
>> ializer.java
>> >>>>
>> >>>> Thanks.
>> >>>> Nicolas
>> >>>>
>> >>>
>> >>
>> >>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message