samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Pan <>
Subject Re: Kafka 0.9 as part of Samza 0.10?
Date Thu, 31 Mar 2016 17:26:09 GMT
Hi, Nick,

Let me try to answer in-between the lines:

On Thu, Mar 31, 2016 at 12:49 AM, nick xander <>

> * Do you guys experience issue with Kafka when it is used with log
> compaction for Samza's state full management?

The critical issue on log-compaction in Kafka that we care about is the
case where message compression and log-compaction are *both* used in the
same topic. Currently, for changelog topics, we forcefully turned off
compression. Hence, it is not a problem for Samza's KV-stores. It is still
a problem for checkpoint topics if the Kafka producer is configured to use
message compression.

> * What is the avg number of keys per partition that you have observed in
> Kafka's log compacted topic for state full management, total number of
> partition, replication factor and number of Kafka brokers?

This number varies *a lot*, depending on how big your KV-store is. For
example, we have seem around 5-10GB of RocksDB KV-stores being stored in
changelog in LinkedIn. That will cause a long bootstrap time when the
container is restarted on a different host. Hence, we included
host-affinity feature in Samza 0.10, which cut down the bootstrap time for
that particular job by 20x.

> * Will Kafka 0.9 upgrade will be included as part of Samza 0.10.1 as it
> seems critical if Samza is used for stateful management? And what is the
> timeline for Samza 0.10.1 that you are expecting?

We are planning to release Samza 0.10.1 very soon and are working on
pending code reviews and validations now. Depending on the test/validation
cycles, we hope to get Samza 0.10.1 release candidate ready in a month or
so. Kafka 0.9 upgrade will likely not be in Samza 0.10.1, due to the tight
release timeline this time.

> * What is recommendation between the usage of Samza vs Kafka connect?
> Should we use Samza for state full management and Kafka connect for other
> stateless streaming soslution?
KafkaConnect is mainly an ingest/output connector to/from Kafka, not having
much stateful processing. Samza actually does both ingest/output and
stateful process. If there are input data sources that Samza does not have
a SystemConsumer implementation for yet, you can definitely use
KafkaConnect for ingestion and Samza for stateful processing.

Hope the above answered your questions.



> Thanks,
> Nick

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message