samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karthik Sriram <amaron...@gmail.com>
Subject Re: Collocating Samza(YARN) and Kafka/ZK clusters
Date Sat, 14 Feb 2015 03:17:10 GMT
Hey Jay/Chris,
  Thanks for your valuable input. I tried tuning the replica sync
timeout/ZK session timeout etc but it helped to keep the cluster stable a
little bit longer but not by much. I then stumbled upon and found that I
was probably hit by this https://issues.apache.org/jira/browse/KAFKA-1382.
We have been using Kafka 0.8.1.1 for our central logging system for a while
now without seeing much issues. It seems that we hit this issue only with
our smaller samza-kafka cluster that we have been pushing to the limits in
our testing phase. Or its possible that collocating ZK/YARN and Kafka
triggers this issue easily. We upgraded to Kafka 0.8.2 for the samza-kafka
cluster and everything works like a charm now. Thanks a lot for your input,
it gave me direction!

Thanks a lot,
Karthik


Thanks,
Karthik

On Tue, Feb 10, 2015 at 9:51 AM, Chris Riccomini <criccomini@apache.org>
wrote:

> Hey Karthik,
>
> I've never tried running ZK on the same machines as Kafka/Samza.
>
> Co-locating Kafka/Samza worked pretty well for us until we started using
> Samza's state management facilities. At this point, Samza's state stores
> started messing with the OS page cache in a way that impacted the Kafka
> brokers' performance. Kafka doesn't really have a cache; it just uses page
> cache. So, when the page cache is being used for other things (e.g. RocksDB
> bytes), it causes Kafka to go to disk more often, which increases latency
> amongst consumers.
>
> If you're not running state with your Samza jobs, then it doesn't seem like
> the jobs should impact Kafka, unless you're over-provisioning the machines,
> and saturating the CPU or network.
>
> In general, it's probably a best practice not to run the jobs on the same
> machines as the brokers.
>
> Cheers,
> Chris
>
> On Mon, Feb 9, 2015 at 9:20 PM, Vijay Gill <vijay.gill@gmail.com> wrote:
>
> > Is there a substantial variance in performance caused by high cpu load
> and
> > cache churn? I've seen this sort of inadequate perf isolation wreak havoc
> > on high QPS systems.
> >
> > On Mon Feb 09 2015 at 4:55:28 PM Jay Kreps <jay.kreps@gmail.com> wrote:
> >
> > > It may or may not be due to colocating Kafka and Samza but you are
> > probably
> > > tripping the failure detection in Kafka which considers a replica out
> of
> > > sync if it falls more than N messages behind. Can you try tuning this
> > > setting as described here:
> > > https://cwiki.apache.org/confluence/display/KAFKA/FAQ#
> > > FAQ-HowtoreducechurnsinISR?WhendoesabrokerleavetheISR
> > > ?
> > >
> > > -Jay
> > >
> > > On Mon, Feb 9, 2015 at 4:35 PM, Karthik Sriram <amaron.me@gmail.com>
> > > wrote:
> > >
> > > > Hey all,
> > > >  I'm trying to run samza on a 5 node (YARN/Kafka/ZK) cluster with
> each
> > > box
> > > > running all 3 processes on AWS. I have been facing very weird
> > performance
> > > > issues with Kafka when run this way. Kafka seems to get unbalanced
> very
> > > > often with replicas going out of sync every so often. This results in
> > > lost
> > > > messages when producing to this cluster. I initially suspected it
> was a
> > > > scale issue (70k-80k qps of incoming messages, ~120k qps peak) and
> > > reduced
> > > > write throughput by sampling just 10% of the messages but I still
> > noticed
> > > > the same issues. The weird part is that this doesn't happen every
> time
> > I
> > > > run, but many of the times.
> > > >
> > > > We have been using a much larger Kafka cluster for long with great
> > > > performance and have never seen such issues before. Then I saw (
> > > > https://engineering.linkedin.com/samza/operating-apache-samza-scale)
> > > which
> > > > mentions that LinkedIn also faced some issues when collocating Samza
> > and
> > > > Kafka.
> > > >
> > > > Can someone throw some light on this? Is collocating samza and kafka
> a
> > > > strict no, or is it more likely a Kafka/machine tuning issue ? Any
> help
> > > is
> > > > appreciated!
> > > >
> > > > Kafka version : 0.8.1.1
> > > > Samza version: 0.8
> > > >
> > > > Thanks a lot for your time,
> > > > Karthik
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message