samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <criccom...@linkedin.com.INVALID>
Subject Re: DirectMemory buffers
Date Tue, 16 Sep 2014 15:17:03 GMT
Hey Steve,

I'd be very interested in hearing what you discover.

Most performance-related knowledge that I have is about tuning Kafka to go
fast. :)

As far as implementation goes, I think you'll need to implement a
SystemConsumer, SystemProducer, SystemAdmin, and SystemFactory in order to
fully support direct memory. The main problem with "swapping" out Kafka is
that you're going to lose some of Samza's guarantees. Samza depends a lot
on the guarantees of the underlying streaming system for things like:

* Message ordering.
* At-least once messaging.
* Replayability (offsets).
* Fault tolerance (replication).

If your direct memory implementation doesn┬╣t provide some of these
features, then neither can Samza. That may be fine, or that may be
unsatisfactory for your use case. Samza will work without these features,
but makes no effort to provide them itself. This means if, for example,
your direct memory implementation isn't repayable, then your offset
checkpoints are useless in Samza, and will be disregarded (you'll always
start consuming from wherever the direct memory SystemConsumer
implementation decides to start).

Cheers,
Chris

On 9/16/14 4:21 AM, "Steven Yates" <syates@stevendyates.com> wrote:

>Hi devs, i am looking to get as much performance out of Samza as possible
>and am interested in looking at what effect a direct memory approach has
>on performance an whether frameworks like Kafka can be swapped out for a
>more direct off heap approach I am trialling this implementation now in
>my local env however I don't have exact metrics yet. I was wondering if
>you guys had any further thoughts on this?
>
>-Steve


Mime
View raw message