kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Schierbeck <daniel.schierb...@gmail.com>
Subject Re: Using Kafka as a persistent store
Date Mon, 13 Jul 2015 17:35:36 GMT
Am I correct in assuming that Kafka will only retain a file handle for the last segment of
the log? If the number of handles grows unbounded, then it would be an issue. But I plan on
writing to this topic continuously anyway, so not separating data into cold and hot storage
is the entire point. 

Daniel Schierbeck

> On 13. jul. 2015, at 15.41, Scott Thibault <scott.thibault@multiscalehn.com> wrote:
> 
> We've tried to use Kafka not as a persistent store, but as a long-term
> archival store.  An outstanding issue we've had with that is that the
> broker holds on to an open file handle on every file in the log!  The other
> issue we've had is when you create a long-term archival log on shared
> storage, you can't simply access that data from another cluster b/c of meta
> data being stored in zookeeper rather than in the log.
> 
> --Scott Thibault
> 
> 
> On Mon, Jul 13, 2015 at 4:44 AM, Daniel Schierbeck <
> daniel.schierbeck@gmail.com> wrote:
> 
>> Would it be possible to document how to configure Kafka to never delete
>> messages in a topic? It took a good while to figure this out, and I see it
>> as an important use case for Kafka.
>> 
>> On Sun, Jul 12, 2015 at 3:02 PM Daniel Schierbeck <
>> daniel.schierbeck@gmail.com> wrote:
>> 
>>> 
>>>> On 10. jul. 2015, at 23.03, Jay Kreps <jay@confluent.io> wrote:
>>>> 
>>>> If I recall correctly, setting log.retention.ms and
>> log.retention.bytes
>>> to
>>>> -1 disables both.
>>> 
>>> Thanks!
>>> 
>>>> 
>>>> On Fri, Jul 10, 2015 at 1:55 PM, Daniel Schierbeck <
>>>> daniel.schierbeck@gmail.com> wrote:
>>>> 
>>>>> 
>>>>>> On 10. jul. 2015, at 15.16, Shayne S <shaynest113@gmail.com>
wrote:
>>>>>> 
>>>>>> There are two ways you can configure your topics, log compaction
and
>>> with
>>>>>> no cleaning. The choice depends on your use case. Are the records
>>>>> uniquely
>>>>>> identifiable and will they receive updates? Then log compaction is
>> the
>>>>> way
>>>>>> to go. If they are truly read only, you can go without log
>> compaction.
>>>>> 
>>>>> I'd rather be free to use the key for partitioning, and the records
>> are
>>>>> immutable — they're event records — so disabling compaction altogether
>>>>> would be preferable. How is that accomplished?
>>>>>> 
>>>>>> We have a small processes which consume a topic and perform upserts
>> to
>>>>> our
>>>>>> various database engines. It's easy to change how it all works and
>>> simply
>>>>>> consume the single source of truth again.
>>>>>> 
>>>>>> I've written a bit about log compaction here:
>>> http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/
>>>>>> 
>>>>>> On Fri, Jul 10, 2015 at 3:46 AM, Daniel Schierbeck <
>>>>>> daniel.schierbeck@gmail.com> wrote:
>>>>>> 
>>>>>>> I'd like to use Kafka as a persistent store – sort of as an
>>> alternative
>>>>> to
>>>>>>> HDFS. The idea is that I'd load the data into various other systems
>> in
>>>>>>> order to solve specific needs such as full-text search, analytics,
>>>>> indexing
>>>>>>> by various attributes, etc. I'd like to keep a single source
of
>> truth,
>>>>>>> however.
>>>>>>> 
>>>>>>> I'm struggling a bit to understand how I can configure a topic
to
>>> retain
>>>>>>> messages indefinitely. I want to make sure that my data isn't
>> deleted.
>>>>> Is
>>>>>>> there a guide to configuring Kafka like this?
> 
> 
> 
> -- 
> *This e-mail is not encrypted.  Due to the unsecured nature of unencrypted
> e-mail, there may be some level of risk that the information in this e-mail
> could be read by a third party.  Accordingly, the recipient(s) named above
> are hereby advised to not communicate protected health information using
> this e-mail address.  If you desire to send protected health information
> electronically, please contact MultiScale Health Networks at (206)538-6090*

Mime
View raw message