samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <>
Subject Re: Special Bay Area HUG: Tajo and Samza
Date Fri, 18 Oct 2013 18:01:09 GMT
Hey Gary,


Locality: A few things to note here.

1. We run one broker per host, as you suggest (18 nodes = 18 brokers).
2. Samza does not explicitly try to do any co-location right now. Any
locality that we get is purely luck.
3. YARN allows you to make resource requests for a specific host/rack.
This is the feature we would like to use to provide better locality.

We haven't done any meaningful evaluation of the locality we're getting
(or would get) right now, though.

Operations: Yes, we have a pretty cool set of Samza jobs that Jakob wrote
to do some metrics/monitoring stuff. He can probably talk more about it
than I can. We're planning on putting up a blog post in the near future
about it.

More broadly, we have a pretty well defined service container at LinkedIn.
These services are called via RPC. Every time an RPC request is made, the
service logs out information about the request: who sent the request, what
method was called, how long it took to process, etc etc. In addition, we
also have all WARN/ERROR log events flowing through Kafka as well (via
Kafka's Log4j appender). There is a brief mention of this in:

As you can imagine, there are a ton of things you can do with this data. :)


On 10/18/13 4:44 AM, "Garry Turkington" <>

>Hi Chris,
>Nice presentation -- 2 questions:
>1. I had wondered about the references to Kafka broker colocation I'd
>seen around the place.  So for example in the 18-node sized cluster you
>mention you'd have 18 Kafka brokers running there, 1 per host?  Do you
>actually get any sort of data locality benefits from this, is there a way
>to ensure that the Samza container on host x is processing the partitions
>of each topic on the collocated Kafka broker?  Or am I missing the intent?
>2. Interested at your mention of using something like Samza for
>processing of monitoring and metric type data, it's something we've been
>talking about internally.  Anything been published on what you are doing
>in that space?
>-----Original Message-----
>From: Chris Riccomini []
>Sent: 17 October 2013 21:54
>Subject: Re: Special Bay Area HUG: Tajo and Samza
>Hey Guys,
>On a related note, my talk from the YARN meet up at LinkedIn is now
>If you're not too familiar with Samza, this is a great place to start.
>Also, feedback welcome on presentation content, style, etc.
>On 10/17/13 11:08 AM, "Jakob Homan" <> wrote:
>>Hey everybody-
>>   Join us at LinkedIn Nov. 5 for a special HUG dedicated to two new
>>awesome Incubator projects, Tajo, a low-latency SQL query engine atop
>>YARN and Samza.
>No virus found in this message.
>Checked by AVG -
>Version: 2013.0.3408 / Virus Database: 3222/6751 - Release Date: 10/15/13

View raw message