samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roger Hoover <roger.hoo...@gmail.com>
Subject Re: Newbie questions after completing "Hello Samza" about performance and project setup
Date Thu, 09 Apr 2015 15:46:32 GMT
Hi Warren,

Yes, I think Hello Samza is the template project to work from.  I believe
that the slow message rate that you are seeing is because it's subscribed
to the the wikipedia IRC stream which may only generate a few events per
second.

That said, some of the example configuration for the hello samza demo is
not tuned for performance.

In general, enabling compression can help a lot for jobs that are I/O
bound.  Enabling lz4 on JSON data, for example, shrinks it 10x.

On the consumer side, setting  task.consumer.batch.size might help.

On the producer side, you might want to play around with these settings.

systems.kafka.producer.compression.type
systems.kafka.producer.batch.size
systems.kafka.producer.linger.ms

http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html
http://kafka.apache.org/documentation.html#newproducerconfigs

Cheers,

Roger

On Thu, Apr 9, 2015 at 1:14 AM, Warren Henning <warren.henning@gmail.com>
wrote:

> Hi,
>
> I ran the commands in http://samza.apache.org/startup/hello-samza/0.9/
> successfully. Fascinating stuff!
>
> I was running all the processes on my (fairly recent model) Macbook Pro.
> One aspect I've heard about Kafka and Samza is performance -- handling
> thousands of messages a second. E.g.,
>
> http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> talks about doing millions of writes a second. The rate at which the
> console emitted new messages seemed like a rate far slower than that --
> maybe something on the order of 1-2 a second. I ran the commands and
> everything exactly as is listed on the tutorial page.
>
> Of course a laptop is vastly different from a production setup -- what kind
> of assumptions can you make about performance of Samza jobs in development
> mode? I realize it depends on what you're doing -- it's just very different
> from what I was expecting.
>
> Also, I'm not really sure about the best way to get started with writing my
> own Samza jobs. Is there a project template to work off of? Is the Hello
> Samza project it? Maybe import the Maven POM into a favorite IDE and rip
> out the Wikipedia-related classes? As someone who has written Java before
> but doesn't write it every day, it wasn't immediately clear to me.
>
> Apologies if these are addressed in blog posts/FAQs/documentation and I
> failed to research them adequately.
>
> Thanks!
>
> Warren
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message