samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gianmarco De Francisci Morales <g...@apache.org>
Subject Re: Thoughts and obesrvations on Samza
Date Fri, 03 Jul 2015 08:40:10 GMT
Hi Jay,

Thanks for your answer.


> However a few things have changed since that original design:
> 1. We now have the additional use cases of copycat and Samza
> 2. We now realize that the assignment strategies don't actually necessarily
> ensure each partition is assigned to only one consumer--there are really
> valid use cases for broadcast or multiple replica assignment schemes--so we
> can't actually make the a hard assertion on the server.
>
> So it may make sense to revist this, I don't think it is necessarily a
> massive change and would give more flexibility for the variety of cases.
>
> -Jay


I totally agree, the 1-partition-1-task mapping is too restrictive.
However, I think the fundamental operation that Samza, Copycat, and Kafka
consumers should agree upon is "how can I specify in a simple and
transparent way which partitions I want to consume, and how?".
This means providing a mapping from partitions to consumer tasks, possibly
in a transparent way so as to allow for optimizations in placement,
co-partitioning, etc...
This issue has the potential of generating again a lot of duplicate work,
and I think it should be solved at the Kafka level.
Given that Copycat and normal consumers are already inside Kafka, I think
having Samza there as well would simplify things a lot.
The result is that Kafka would be a complete package for handling streams:
- Messaging, partitioning, and fault tolerance (Kafka core)
- Ingestion (Copycat)
- Lightweight processing (Samza)
- Coupling with other systems (Kafka consumers)

Cheers,

--
Gianmarco

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message