samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagadish Venkatraman <jagadish1...@gmail.com>
Subject Re: Comparison between Samza and Kafka Streams
Date Fri, 24 Nov 2017 16:47:30 GMT
Thanks for the feedback Giridhar!

We'll add a comparison with KStreams there as well.

Roughly, the two are similar - The design of Samza certainly influenced
what went
into Kafka Streams. However, here are some key differences:

- Support for non-Kafka source and sink natively: Samza has native
connectors
for various systems like ElasticSearch, AWS Kinesis, Azure EventHubs, HDFS
in the
open-source. This has cost-benefits if you don't want to maintain dual
copies to import
the data into Kafka.

- Async-mode: At LinkedIn, we have observed that jobs are bottle-necked by
remote I/O.
For this reason, we built native async-processing into Samza. As far as I
can remember
, Samza is the only stream processor that supports this feature (as of
early 2017).

- Stability at LinkedIn: We run Samza in production at LinkedIn, and it's
battle-tested at scale
powering all of our near-realtime processing use-cases. On YARN, Samza
supports durable local
state and host-affinity for instant state recovery. We have made
improvements to this by
adding incremental checkpointing.

- Single API and SQL for streaming and batch processing: Samza can run the
same code on
both batching and streaming sources. We just added SQL support in the
open-source.

PS: Some of this discussion is based on Kartik's and Yi's earlier responses
in 2016.

Yi's earlier response:
http://mail-archives.apache.org/mod_mbox/samza-dev/201608.mbox/%3CCAFvExu1KghxR1dN7Awwr70k3b4aMmfBVLhKFjFd2smsUAt3rDg%40mail.gmail.com%3E

Kartik's earlier response:
http://mail-archives.apache.org/mod_mbox/samza-dev/201605.mb
ox/%3CCACsAj_XZZBohSz7Cf9%3DLO5MDOn2vEzfMrDF6Te%3DwrpeMEab1d
Q%40mail.gmail.com%3E

On Thu, Nov 23, 2017 at 10:15 PM, Giridhar Addepalli <giridhar1202@gmail.com
> wrote:

> Hi,
>
> Thank you for providing comparison between Samza and Spark Streaming,
> Mupd8, Storm.
> Looks like there is new player in the field : Kafka Streams (
> https://docs.confluent.io/current/streams/index.html).
>
> It will good to have comparison between Samza and Kafka Streams as well.
>
> From high-level it looks like "Samza when used as a library" is similar to
> "Kafka Streams".
>
> Thanks,
> Giridhar.
>



-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message