samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Martin <rob.mart...@gmail.com>
Subject Re: Samza for text processing
Date Sun, 28 Apr 2019 17:09:12 GMT
Thanks for the reply. We are currently deciding between kafka streams and
Samza. Which do you think would be more appropriate?

Also for files over 1mb would you increase the default kafka limit? Break
the document into chunks or pass a reference in the message?

Thanks again



On Sun, 28 Apr 2019, 16:20 Jagadish Venkatraman, <jagadish1989@gmail.com>
wrote:

> Hi Rob,
>
> Yes, your use-case is a good fit. You can use Samza for fault-tolerant
> stream processing.
>
> We have document (eg: member profiles, articles/blogs) standardization
> use-cases at LinkedIn powered by Samza.
>
> Please let us know should you have further questions!
>
> On Sun, Apr 28, 2019 at 7:09 AM Rob Martin <rob.martin0@gmail.com> wrote:
>
> > Im looking at creating a distributed steaming pipeline for processing
> text
> > documents (eg cleaning, NER and machine learning). Documents will
> generally
> > be under 1mb and processing will be stateless. Was aiming to feed
> documents
> > from various sources and additional data into Kafka to be streamed to the
> > proccing pipeline in Samza. Would this be an appropriate use case for
> > Samza?
> >
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message