samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Pan <nickpa...@gmail.com>
Subject Re: Detecting "done" on a bounded input dataset
Date Wed, 14 Oct 2015 18:25:11 GMT
Hi, Kishore,

First I want some clarification on your use case.
1) Scenario 1: you still want the Samza jobs continuously running, while
simply want to detect the end of a certain stream. On detection, do you
need to unsubscribe from the stream? Or you are still OK receiving more
messages from the stream?
2) Scenario 2: you want the Samza jobs to shutdown when detecting the end
of a certain stream.

Which scenario are you targeting?

Thanks!

-Yi

On Wed, Oct 14, 2015 at 9:33 AM, Kishore N C <kishorenc@gmail.com> wrote:

> Hi,
>
> Our data processing pipeline consists of a set of Samza jobs, that form a
> DAG. Sometimes, we have to throw finite datasets into the Kafka topic that
> acts as the entry point to the pipeline. Given that different Samza jobs in
> the DAG could have varying latencies in terms of processing the records (or
> could even temporarily fails or be stuck), how do I detect that my assembly
> of jobs have finished processing all records? It's not as simple as
> tallying the input and output record counts, as some jobs could be
> filtering data, and others could be grouping records etc.
>
> Thanks,
>
> Kishore.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message