samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dotan Patrich <>
Subject Re: Samza threads issues
Date Wed, 29 Oct 2014 06:59:33 GMT
Thanks Chris,
We will test our product using SerialGC to see how it behave.

One concern that I have is regarding the kafka topic sizes - Assuming
"stop-the-world" GC stops will more noticable using SerialGC should we
increase the kafka topic sizes to accommodate incoming data during these
time gaps as opposed to the parallel GC?
Or on a broader aspect - What are the best practices to measure and set the
right size for the kafka topics? Can anyone share his experience on that?


On Tue, Oct 28, 2014 at 5:53 PM, Chris Riccomini <> wrote:

> Hey Dotan,
> We run all of our jobs using SerialGC by default. For a few of our
> higher-throughput jobs, we've had better luck with parallel GC or G1, but
> in general, serial works fine.
> Cheers,
> Chris
> On 10/28/14 8:34 AM, "Dotan Patrich" <> wrote:
> >Hi All,
> >
> >I encountered some issues caused by having too many threads for a user on
> >linux CentOS. Investigating this deeper, it turned out that the JVM spawn
> >over 31 threads per process for GC. Having about 18 Samza processes
> >running
> >on the machine we soon got near to the 1000 threads limit per user.
> >I was thinking of running the Samza JVM with SerialGC instead of parallel
> >GC to avoid having so many threads in the environment. In addition,
> >theoretically this seems to be better fitted for situations where we
> >prefer
> >throughput over latency in a single-core environments (this is roughly
> >what
> >we Samza tasks is assigned with).
> >
> >Before doing so, I would really appreciate you insights - did anyone
> >encountered this issue before? Does changing the GC to be serial is a good
> >solution?
> >
> >Thanks,
> >Dotan

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message