samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Pan <nickpa...@gmail.com>
Subject Re: Kafka 0.9 as part of Samza 0.10?
Date Thu, 31 Mar 2016 17:26:09 GMT
Hi, Nick,

Let me try to answer in-between the lines:

On Thu, Mar 31, 2016 at 12:49 AM, nick xander <nickxander123@gmail.com>
wrote:

>
> * Do you guys experience issue with Kafka when it is used with log
> compaction for Samza's state full management?
>

The critical issue on log-compaction in Kafka that we care about is the
case where message compression and log-compaction are *both* used in the
same topic. Currently, for changelog topics, we forcefully turned off
compression. Hence, it is not a problem for Samza's KV-stores. It is still
a problem for checkpoint topics if the Kafka producer is configured to use
message compression.


> * What is the avg number of keys per partition that you have observed in
> Kafka's log compacted topic for state full management, total number of
> partition, replication factor and number of Kafka brokers?
>

This number varies *a lot*, depending on how big your KV-store is. For
example, we have seem around 5-10GB of RocksDB KV-stores being stored in
changelog in LinkedIn. That will cause a long bootstrap time when the
container is restarted on a different host. Hence, we included
host-affinity feature in Samza 0.10, which cut down the bootstrap time for
that particular job by 20x.


> * Will Kafka 0.9 upgrade will be included as part of Samza 0.10.1 as it
> seems critical if Samza is used for stateful management? And what is the
> timeline for Samza 0.10.1 that you are expecting?
>

We are planning to release Samza 0.10.1 very soon and are working on
pending code reviews and validations now. Depending on the test/validation
cycles, we hope to get Samza 0.10.1 release candidate ready in a month or
so. Kafka 0.9 upgrade will likely not be in Samza 0.10.1, due to the tight
release timeline this time.


> * What is recommendation between the usage of Samza vs Kafka connect?
> Should we use Samza for state full management and Kafka connect for other
> stateless streaming soslution?
>
>
KafkaConnect is mainly an ingest/output connector to/from Kafka, not having
much stateful processing. Samza actually does both ingest/output and
stateful process. If there are input data sources that Samza does not have
a SystemConsumer implementation for yet, you can definitely use
KafkaConnect for ingestion and Samza for stateful processing.

Hope the above answered your questions.

Thanks!

-Yi



> Thanks,
> Nick
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message