samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <criccom...@apache.org>
Subject Re: Maximum number of jobs
Date Wed, 15 Apr 2015 17:35:01 GMT
Hey Jeremy,

Samza will be fine, but at this scale you need to start worrying about
Kafka and YARN. 1 million jobs will likely start to put pressure on YARN's
RM due to memory usage and CPU usage for the scheduler. With 1 million
jobs, assuming 1 container each, you'll have over 1 million connections to
Kafka, which means you'll need enough brokers to handle those connections.

Can you describe your use case in more detail? Running 1 million jobs seems
like it might be a mis-use of this technology.

Cheers,
Chris

On Wed, Apr 15, 2015 at 10:24 AM, jeremy p <athomewithagroovebox@gmail.com>
wrote:

> What's the maximum number of Samza jobs I can run simultaneously on a
> single cluster?  Let's say these jobs are very lightweight -- they require
> little memory or processing power.  However, I need a lot of them -- let's
> say I need to have 1,000,000 running at any given time.  Is this reasonable
> or even possible?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message