kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyounmin Wang <hyunmi...@gmail.com>
Subject Re: Kafka Beginners planning problem.
Date Wed, 06 Jul 2016 01:28:08 GMT
Hi David

Thank you for your comments. My concern about that idea is that with only
one topic, it will slow a lot of things down. I am assuming there are at
least 6~7 physical consumers so I can safely assume to have more topics. (
Separate topic by operation perhaps?)

Also according to your approach, wouldn't partition be created for 100
millions? as far as I know, partition works in IO file which means it will
slow entire system down (Am I even correct on this?)

Its all matter of how to make sure user A activity does not block User B

Thank you for your answers!


On Wed, Jul 6, 2016 at 12:24 AM, David Newberger <
david.newberger@wandcorp.com> wrote:

> Hi,
>
> I think the recommended approach to this would be to have a single topic
> and partition it by userId. This will give you locality and order by user.
> If you think about it this would give you a better ordering guarantee than
> if you had one topic per users. It's also a lot more efficient. If you are
> using Kafka as a log or messaging system you really should not need
> millions of topics or partitions. If I'm miss understanding the use case
> please let me know.
>
> Cheers,
>
> David Newberger
>
> -----Original Message-----
> From: Hyounmin Wang [mailto:hyunmin90@gmail.com]
> Sent: Tuesday, July 5, 2016 1:50 AM
> To: users@kafka.apache.org
> Subject: Kafka Beginners planning problem.
>
> Hi there!
>
> I'm new grad engineer and is pretty new to kafka world.
>
> I'm trying to replace rabbit mq with apache-kafka and while planning, I
> bumped in to several conceptual planning problem.
>
> First we are using rabbit mq for per user queue policy meaning each user
> uses one queue. This suits our need because each user represent some job to
> be done with that particular user, and if that user causes a problem, the
> queue will never have a problem with other users because queues are
> seperated ( Problem meaning messages in the queue will be dispatch to the
> users using http request. If user refuses to receive a message (server down
> perhaps?) it will go back in retry queue, which will result in no loses of
> message (Unless queue goes down))
>
> Now kafka is fault tolerant and failure safe because it write to a disk.
> And its exactly why I am trying to implement kafka to our structure.
>
> but there are problem to my plannings.
>
> First, I was thinking to create as many topic as per user meaning each
> user would have each topic (What problem will this cause? My max estimate
> is that I will have around 1~5 million topics)
>
> Second, If I decide to go for topics based on operation and partition by
> random hash of users id, if there was a problem with one user not consuming
> message currently, will the all user in the partition have to wait ? What
> would be the best way to structure this situation?
>
> So as conclusion, 1~5 millions users. We do not want to have one user
> blocking large number of other users being processed. Having topic per user
> will solve this issue, it seems like there might be an issue with zookeeper
> if such large number gets in (Is this true? )
>
> what would be the best solution for structuring? Considering scalability?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message