kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Molchanov <pavel.molcha...@infodesk.com>
Subject Re: Event Sourcing question
Date Wed, 08 May 2019 22:43:25 GMT
Ryanne,

Thank you for the advice. It's exactly what we have now. Different topics
for different processing steps.

However, I would like to make extensible architecture with multiple
processing steps.

We will introduce other transformers later on. I was thinking that if all
of them will send events on the same bus it will be easier to replay the
event sequence later.

Say, I will have 4 other steps later on in the pipeline. Should I create 4
more topics? Or should all other step transformers listen to the same bus
and pick up events they need?

*Pavel Molchanov*
Director of Software Development
InfoDesk
www.infodesk.com

1 Bridge Street  | Suite 105 | Irvington | New York | 10551
<https://maps.google.com/?q=660+White+Plains+Road+%7C+Suite+300+%7C+Tarrytown+%7C+New+York+%7C+10591&entry=gmail&source=g>|
Office: +1 (914) 332-5940
Change Privacy Settings
<https://www.infodesk.com/unsubscription-options> | Contact
Privacy Team <unsubscribe@infodesk.com> | Privacy Policy
<https://www.infodesk.com/privacy-policy>

This e-mail message may contain confidential or legally privileged
information and is intended only for the use of the intended recipient(s).
Any unauthorized disclosure, dissemination, distribution, copying or the
taking of any action in reliance on the information herein is prohibited.



On Wed, May 8, 2019 at 3:45 PM Ryanne Dolan <ryannedolan@gmail.com> wrote:

> Pavel, one thing I'd recommend: don't jam multiple event types into a
> single topic. You are better served with multiple topics, each with a
> single schema and event type. In your case, you might have a received topic
> and a transformed topic, with an app consuming received and producing
> transformed.
>
> If your transformer process consumes, produces, and commits in the right
> order, your app can crash and restart without skipping records. Consider
> using Kafka Streams for this purpose, as it takes care of the semantics you
> need to do this correctly.
>
> Ryanne
>
> On Wed, May 8, 2019 at 12:06 PM Pavel Molchanov <
> pavel.molchanov@infodesk.com> wrote:
>
> > I have an architectural question.
> >
> > I am planning to create a data transformation pipeline for document
> > transformation. Each component will send processing events to the Kafka
> > 'events' topic.
> >
> > It will have the following steps:
> >
> > 1) Upload data to the repository (S3 or other storage). Get public URL to
> > the uploaded document. Create 'received' event with the document URL and
> > send the event to the Kafka 'events' topic.
> >
> > 2) Tranformer process will be listening to the Kafka 'events' topic. It
> > will react on the 'received' event in the 'events' topic, will download
> the
> > document, transform it, push the transformed document to the repository
> (S3
> > or other storage), create 'transformed' event and send 'transformed'
> event
> > to the same 'events' topic.
> >
> > Tranformer process can break in the middle (exception, died, crashed,
> > etc.). Upon startup, Tranformer process needs to check 'events' topic for
> > documents that were received but not transformed.
> >
> > Should it read all events from the 'events' topic? Should it join
> > 'received' and 'transformed' events somehow to understand what was
> received
> > but not transformed?
> >
> > I don't have a clear idea of how it should behave.
> >
> > Please help.
> >
> > *Pavel Molchanov*
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message