samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Pan <nickpa...@gmail.com>
Subject Re: Thoughts and obesrvations on Samza
Date Mon, 06 Jul 2015 17:45:57 GMT
Hi, Gianmarco,

{quote}
However, I think the fundamental operation that Samza, Copycat, and Kafka
consumers should agree upon is "how can I specify in a simple and
transparent way which partitions I want to consume, and how?".
{quote}

I agree that some basic partition distribution mechanism can be common and
those common use patterns should be provided / solved at Kafka level. I
would argue that a client-side pluggable logic is needed for the following
two reasons:
1. On the broker-side, the system does not have a view on client-side
resource/state (i.e. host-affinity of local state is a good example). When
the partition distribution/assignment needs to take client-side
resource/state into consideration, we need the client-side logic.
2. When we run Samza as a service, their might be additional resource/quote
related policies that requires an application-level decision, which the
information needed for decision is not visible at Kafka-level. In that
case, a pluggable client-side logic is useful.

Thanks!

On Fri, Jul 3, 2015 at 1:40 AM, Gianmarco De Francisci Morales <
gdfm@apache.org> wrote:

> Hi Jay,
>
> Thanks for your answer.
>
>
> > However a few things have changed since that original design:
> > 1. We now have the additional use cases of copycat and Samza
> > 2. We now realize that the assignment strategies don't actually
> necessarily
> > ensure each partition is assigned to only one consumer--there are really
> > valid use cases for broadcast or multiple replica assignment schemes--so
> we
> > can't actually make the a hard assertion on the server.
> >
> > So it may make sense to revist this, I don't think it is necessarily a
> > massive change and would give more flexibility for the variety of cases.
> >
> > -Jay
>
>
> I totally agree, the 1-partition-1-task mapping is too restrictive.
> However, I think the fundamental operation that Samza, Copycat, and Kafka
> consumers should agree upon is "how can I specify in a simple and
> transparent way which partitions I want to consume, and how?".
> This means providing a mapping from partitions to consumer tasks, possibly
> in a transparent way so as to allow for optimizations in placement,
> co-partitioning, etc...
> This issue has the potential of generating again a lot of duplicate work,
> and I think it should be solved at the Kafka level.
> Given that Copycat and normal consumers are already inside Kafka, I think
> having Samza there as well would simplify things a lot.
> The result is that Kafka would be a complete package for handling streams:
> - Messaging, partitioning, and fault tolerance (Kafka core)
> - Ingestion (Copycat)
> - Lightweight processing (Samza)
> - Coupling with other systems (Kafka consumers)
>
> Cheers,
>
> --
> Gianmarco
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message