samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Lee <rd...@tivo.com>
Subject Re: Kafka partition key
Date Thu, 26 Mar 2015 16:28:23 GMT
Is there a typo below?  Are all of these actually in the same topic, just different partitions?
 Partitioning, AFAIK, is mainly done for parallelism & throughput reasons.  What is the
reason for partitioning your dataset by ‘columns’?

https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic?

Richard

> On Mar 26, 2015, at 8:22 AM, Shekar Tippur <ctippur@gmail.com> wrote:
>
> Hello,
>
> Want to confirm a basic understanding of Kafka.
> If I have a dataset that needs to be partitioned by 4 columns, then the
> progression is
>
> {topic1:partition_key1} -> {Group by samza on partition_key1}
> ->
> {topic2:partition_key2} -> {Group by samza on partition_key2}
> ->
> {topic3:partition_key3} -> {Group by samza on partition_key3}
> ->
> {topic4:partition_key4} -> {Group by samza on partition_key4}
>
> Can you please confirm if my understanding is right?
>
> - Shekar


________________________________

This email and any attachments may contain confidential and privileged material for the sole
use of the intended recipient. Any review, copying, or distribution of this email (or any
attachments) by others is prohibited. If you are not the intended recipient, please contact
the sender immediately and permanently delete this email and any attachments. No employee
or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc.
by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.
Mime
View raw message