samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <criccom...@linkedin.com>
Subject Re: Can Samza scale up stream tasks at runtime?
Date Mon, 14 Oct 2013 20:23:46 GMT
Hey Yong,

No problem. It is conceivable to add the feature you're talking about (or even full-blown
auto-scaling), but we haven't felt a pressing need for it right now.

The reason that we don't have this feature is that Samza's partitions are statically assigned
to a container. For example, if you run two containers, and four partitions, container 0 gets
partitions 0 and 2, and container 1 gets partitions 1 and 3. If you were to then add a third
container, the partition assignments change, since the new container needs to take control
of some partitions.

Right now, Samza doesn't support dynamically relocating partitions in a container. This was
a conscious decision to reduce operational complexity (to avoid requiring ZooKeeper in containers).

Without dynamic partition reassignment, the Samza ApplicationMaster (AM) has to kill a container,
then start it again with new partitions. Thus, to add a new container to a running job, the
AM would first kill all containers, re-assign partitions so that the new container gets some
partitions, and then start all the containers up again. This is nearly the same operation
as just restarting the job. We opted to force a job restart because of this.

Furthermore, by restarting the Samza job, you get the added benefit that a job restart requires
a config change to update the yarn.container.count config value, which means you get all the
nice things that come with updating configuration (seeing who made the change, seeing when
the change was made, seeing the current value, etc). This stuff is really useful in a production
environment when multiple parties are involved.

Cheers,
Chris

From: Yong Yuan <yycslab@gmail.com<mailto:yycslab@gmail.com>>
Reply-To: "yycslab@gmail.com<mailto:yycslab@gmail.com>" <yycslab@gmail.com<mailto:yycslab@gmail.com>>
Date: Monday, October 14, 2013 1:12 PM
To: Chris Riccomini <criccomini@linkedin.com<mailto:criccomini@linkedin.com>>
Subject: Re: Can Samza scale up stream tasks at runtime?

Thanks, Chris!


On Mon, Oct 14, 2013 at 8:54 AM, Chris Riccomini <criccomini@linkedin.com<mailto:criccomini@linkedin.com>>
wrote:
Hey Yong,

Samza does not support dynamic scaling. You'd have to bump up your
yarn.container.count (assuming you're using YARN) and re-start the job.

Cheers,
Chris

On 10/13/13 10:42 AM, "Yong Yuan" <yycslab@gmail.com<mailto:yycslab@gmail.com>>
wrote:

>Hi, folks,
>
>Does Samza support scaling up processing at runtime? That is, if I add new
>Kafka brokers or increase the number of partitions for a topic while my
>Samza job is running, will Samza allocate more stream tasks accordingly?
>Or
>do I have to relaunch the tasks?
>
>Thanks,



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message