samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Maes <jma...@apache.org>
Subject [DISCUSS] SAMZA-1041 Multi-stage feature for Samza
Date Tue, 13 Dec 2016 21:54:30 GMT
Hey folks,

A while ago I created SAMZA-1041
<https://issues.apache.org/jira/browse/SAMZA-1041> to add a multistage
feature to Samza. The goal was to enable users to deploy a set of
processors as a unit with the intermediate topics being created
automatically. There are a number of use cases, including the
repartitioner-main pattern and multistage HDFS jobs. Ultimately this will
make it easier for users to deploy a DAG of Samza processors and reduce the
common configuration pitfalls.

We've created a basic prototype and are ready to get started with this
feature. A design is coming soon, but in the meantime, I started a couple
of discussions in the comments to get some early feedback.

Discussion 1 is asking for general feedback on the utility of this feature
and any ideas to improve it.

Discussion 2 is about the integration with the Fluent API feature, which
also deals with data pipelines from a logical perspective. The goal is to
make the distinction and contract between these features clear.

Thanks in advance for the feedback!

-Jake

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message