samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milinda Pathirage <mpath...@umail.iu.edu>
Subject Re: Kafka partition key
Date Thu, 26 Mar 2015 18:36:34 GMT
Hi Shekar,

Please refer to [1]. You can set a custom partitioner through the producer
cofig. You will have to implement your own partitioner based on your
application and partitioning strategy.

Thanks
Milinda


[1] https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example

On Thu, Mar 26, 2015 at 2:25 PM, Shekar Tippur <ctippur@gmail.com> wrote:

> So if I have a feed with
>
> {user_id:12345,
> ethnicity: asian,
> location: "cerritos, ca",
> Height:"5.9",
> weight: "150 lbs"}
>
> I am referring to https://kafka.apache.org/081/ops.html#topic-config
>
> How do I map the 3 columns - (user_id, ethnicity, and location) to a
> partition id. If I map it this way and say create 10 partitions, each
> partition will contain a subset of data grouped by these columns - right?
>
> - Shekar
>
>
>
>
> On Thu, Mar 26, 2015 at 9:38 AM, Roger Hoover <roger.hoover@gmail.com>
> wrote:
>
> > Hi Richard,
> >
> > You can also partition by a key like "user_id" so that all messages for a
> > given user would end up in the same partition.  This can be useful for
> > calculating user-specific aggregations or doing a distributed join where
> > the local state is also partitioned on user_id.
> >
> > Cheers,
> >
> > Roger
> >
> > On Thu, Mar 26, 2015 at 9:28 AM, Richard Lee <rdlee@tivo.com> wrote:
> >
> > > Is there a typo below?  Are all of these actually in the same topic,
> just
> > > different partitions?  Partitioning, AFAIK, is mainly done for
> > parallelism
> > > & throughput reasons.  What is the reason for partitioning your dataset
> > by
> > > ‘columns’?
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic
> > > ?
> > >
> > > Richard
> > >
> > > > On Mar 26, 2015, at 8:22 AM, Shekar Tippur <ctippur@gmail.com>
> wrote:
> > > >
> > > > Hello,
> > > >
> > > > Want to confirm a basic understanding of Kafka.
> > > > If I have a dataset that needs to be partitioned by 4 columns, then
> the
> > > > progression is
> > > >
> > > > {topic1:partition_key1} -> {Group by samza on partition_key1}
> > > > ->
> > > > {topic2:partition_key2} -> {Group by samza on partition_key2}
> > > > ->
> > > > {topic3:partition_key3} -> {Group by samza on partition_key3}
> > > > ->
> > > > {topic4:partition_key4} -> {Group by samza on partition_key4}
> > > >
> > > > Can you please confirm if my understanding is right?
> > > >
> > > > - Shekar
> > >
> > >
> > > ________________________________
> > >
> > > This email and any attachments may contain confidential and privileged
> > > material for the sole use of the intended recipient. Any review,
> copying,
> > > or distribution of this email (or any attachments) by others is
> > prohibited.
> > > If you are not the intended recipient, please contact the sender
> > > immediately and permanently delete this email and any attachments. No
> > > employee or agent of TiVo Inc. is authorized to conclude any binding
> > > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> > > Inc. may only be made by a signed written agreement.
> > >
> >
>



-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message