samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dotan Patrich <dot...@fortscale.com>
Subject Re: Kafka partition key
Date Thu, 26 Mar 2015 15:31:17 GMT
Hi Shekar,

Each kafka partition is basically just a number, you would need to specify
what partitioner strategy to use when mapping your event key to the
partition number.
You can take the 4 columns you have in the event and map it to a partition
number,the partitioner in that case would be a function that would work
similar to that notion:  (a, b, c, d) -> (int)

Once you partition your data to different topic partitions, each partition
will hold a sub-set of the dataset that is basically similar to what SQL
"group by" statement would have done.

Hope that helps,

Dotan




On Thu, Mar 26, 2015 at 5:22 PM, Shekar Tippur <ctippur@gmail.com> wrote:

> Hello,
>
> Want to confirm a basic understanding of Kafka.
> If I have a dataset that needs to be partitioned by 4 columns, then the
> progression is
>
> {topic1:partition_key1} -> {Group by samza on partition_key1}
> ->
> {topic2:partition_key2} -> {Group by samza on partition_key2}
> ->
> {topic3:partition_key3} -> {Group by samza on partition_key3}
> ->
> {topic4:partition_key4} -> {Group by samza on partition_key4}
>
> Can you please confirm if my understanding is right?
>
> - Shekar
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message