kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ewen Cheslack-Postava <e...@confluent.io>
Subject Re: Data Structure abstractions over kafka
Date Tue, 14 Jul 2015 04:35:00 GMT

Kafka can be used as a key-value store if you turn on log compaction:
http://kafka.apache.org/documentation.html#compaction You need to be
careful with that since it's purely last-writer-wins and doesn't have
anything like CAS that might help you manage concurrent writers, but the
basic functionality is there. This is used by the brokers to store offsets
in Kafka (where keys are (consumer-group, topic, partition), values are the
offset, and they already have a mechanism to ensure only a single writer at
a time).

You could possibly use this to implement the linked list functionality
you're talking about, although there are probably a number of challenges
(e.g., performing atomic updates if you need a doubly-linked list, ensuring
garbage is collected after removals even if you only need a singly-linked
list, etc). Also, I'm not sure it would be particularly efficient, you'd
still need to ensure a single writer (or at least single writer per linked
list node), etc.

You're almost definitely better off using a specialized store for something
like that simply because Kafka isn't designed around that use case, but
it'd be interesting to see how far you could get with Kafka's current
functionality, and what would be required to make it practical!


On Mon, Jul 13, 2015 at 11:36 AM, Tim Smith <secsubs@gmail.com> wrote:

> Hi,
> In the big data ecosystem, I have started to use kafka, essentially, as a:
> -  unordered list/array, and
> - a cluster-wide pipe
> I guess you could argue that any message bus product is a simple array/pipe
> but kafka's scale and model make things so easy :)
> I am wondering if there are any abstractions on top of kafka that will let
> me use kafka to store/organize other simple data structures like a
> linked-list? I have a use case for massive linked list that can easily grow
> to tens of gigabytes and could easily use - (1) redundancy (2) multiple
> producers/consumers working on processing the list (implemented over spark,
> storm etc).
> Any ideas? Maybe maintain a linked-list of offsets in another store like
> ZooKeeper or a NoSQL DB while store the messages on kafka?
> Thanks,
> - Tim


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message