kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <matth...@confluent.io>
Subject Re: Handling out of order messages without KTables
Date Thu, 06 Oct 2016 21:33:44 GMT
Hash: SHA512

It is not global in this sense.

Thus, you need to ensure that records updating the same product, go to
the same instance. You can ensure this, by given all records of the
same product the same key and "groupByKey" before processing the data.

- -Matthias

On 10/6/16 10:55 AM, Ali Akhtar wrote:
> Thank you, State Store seems promising. But, is it distributed, or
> limited to the particular instance of my application?
> I.e if there are 3 messages, setting product 1's price to $1, $3,
> and $5, and all 3 of them go to a different instance of my
> application, will they be able to correctly identify the latest
> received message using State Store?
> On Thu, Oct 6, 2016 at 10:48 PM, Matthias J. Sax
> <matthias@confluent.io> wrote:
> What do you mean by "message keys are random" -- do you
> effectively have no keys and want all messages to be processed as
> if they all have the same key?
> To access record TS in general, you need to use Processor API. The 
> given ProcessorContext object given by Processor#init() always
> return the timestamp of the currently processed on #timestamp().
> Thus, you can attach a state store to your processor and compare
> the timestamps of the current record with the timestamp of the one
> in your store.
> -Matthias
> On 10/6/16 8:52 AM, Ali Akhtar wrote:
>>>> Heya,
>>>> I have some Kafka producers, which are listening to webhook
>>>> events, and for each webhook event, they post its payload to
>>>> a Kafka topic.
>>>> Each payload contains a timestamp from the webhook source.
>>>> This timestamp is the source of truth about which events
>>>> happened first, which happened last, etc.
>>>> I need to ensure that the last arrival of a particular type
>>>> of message wins.
>>>> E.g, if there are 5 messages, saying the price of a product
>>>> with id 1, was set to $1, then $3, then something else, etc,
>>>> before finally being set to $10, then I need to make sure
>>>> that the final price for that product is $10.
>>>> These messages can be out of order, and I need to determine
>>>> the latest arrival based on the timestamp from the webhook
>>>> source. (Its atm in a string format which can be parsed)
>>>> Since KTable looks like it uses message keys to determine
>>>> what happens - and in this case, the message keys are random,
>>>> and the timestamp contained in the value of the message is
>>>> what determines the order of the events - any pointers on
>>>> what the best way to do this is?
>>>> I'm using kafka streaming, latest version, Java.
Comment: GPGTools - https://gpgtools.org


View raw message