kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Yegorov <andrey.yego...@gmail.com>
Subject Re: High Latency in Kafka
Date Tue, 10 Feb 2015 20:52:30 GMT
I am not familiar with logstash, but in custom log replay tool (used to
replay messages logged locally in case if e.g. kafka was not available and
useful in some other scenarios) I've seen it reaching 30,000 messages/sec
with avg message size of 4.5 kilobytes, all with regular production load on
kafka (6 brokers). At this rate sending 30G of logs should take about 4 min.

Tool has:
one thread to read messages and put into the queue.
5 (configurable) threads that read messages from the queue and send them to
kafka, with one producer per thread.
I am using new producer from kafka 0.8.2.-beta and async send.
I remember that I had to tune some parameters for kafka producer, increased
buffer sizes and something else.

HTH.



----------
Andrey Yegorov

On Tue, Feb 10, 2015 at 5:54 AM, Vineet Mishra <clearmidoubt@gmail.com>
wrote:

> Hi Gwen,
>
> Well I have gone through this link while trying to setup my Logstash Kafka
> handler,
>
> https://github.com/joekiller/logstash-kafka
>
> I could achieve what I was looking for but the performance is badly
> affected while trying to write a big file of GB's.
> I guess there should be some way so as to parallelise the existing running
> process.
>
> Thanks!
>
> On Sun, Feb 8, 2015 at 8:06 PM, Gwen Shapira <gshapira@cloudera.com>
> wrote:
>
> > I'm wondering how much of the time is spent by Logstash reading and
> > processing the log vs. time spent sending data to Kafka. Also, I'm not
> > familiar with log.stash internals, perhaps it can be tuned to send the
> data
> > to Kafka in larger batches?
> >
> > At the moment its difficult to tell where is the slowdown. More
> information
> > about the breakdown of time will help.
> >
> > You can try Flume's SpoolingDirectory source with Kafka Channel or Sink
> and
> > see if you get improved performance out of other tools.
> >
> >
> > Gwen
> >
> > On Sun, Feb 8, 2015 at 12:06 AM, Vineet Mishra <clearmidoubt@gmail.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > I am having some log files of around 30GB, I am trying to event process
> > > these logs by pushing them to Kafka. I could clearly see the throughput
> > > achieved while publishing these event to Kafka is quiet slow.
> > >
> > > So as mentioned for the single log file of 30GB, the Logstash is
> > > continuously emitting to Kafka and it is running from more than 2 days
> > but
> > > still it has processed just 60% of the log data. I was looking out for
> a
> > > way to increase the efficiency of the publishing the event to kafka as
> > with
> > > this rate of data ingestion I don't think it will be a good option to
> > move
> > > ahead.
> > >
> > > Looking out for performance improvisation for the same.
> > >
> > > Experts advise required!
> > >
> > > Thanks!
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message