kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rad Gruchalski <ra...@gruchalski.com>
Subject Re: Using Kafka as a persistent store
Date Mon, 13 Jul 2015 17:17:42 GMT
Indeed, the files would have to be moved to some separate, dedicated storage.  
There are basically 3 options, as kafka does not allow adding logs at runtime:

1. make the consumer able to read from an arbitrary file
2. add ability to drop files in (I believe this adds a lot of complexity)
3. read files with another program, as suggested in my first email

I’d love to get some input from someone who knows the code and options a bit better!  










Kind regards,

Radek Gruchalski

radek@gruchalski.com (mailto:radek@gruchalski.com)
 (mailto:radek@gruchalski.com)
de.linkedin.com/in/radgruchalski/ (http://de.linkedin.com/in/radgruchalski/)

Confidentiality:
This communication is intended for the above-named person and may be confidential and/or legally
privileged.
If it has come to you in error you must take no action based on it, nor must you copy or show
it to anyone; please delete/destroy and inform the sender immediately.



On Monday, 13 July 2015 at 18:02, Scott Thibault wrote:

> Yes, consider my e-mail an up vote!
>  
> I guess the files would automatically moved somewhere else to separate the
> active from cold segments? Ideally, one could run an unmodified consumer
> application on the cold segments.
>  
>  
> --Scott
>  
>  
> On Mon, Jul 13, 2015 at 6:57 AM, Rad Gruchalski <radek@gruchalski.com (mailto:radek@gruchalski.com)>
> wrote:
>  
> > Scott,
> >  
> > This is what I was trying to target in one of my previous responses to
> > Daniel. The one in which I suggest another compaction setting for kafka.
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> > Kind regards,
> > Radek Gruchalski
> > radek@gruchalski.com (mailto:radek@gruchalski.com) (mailto:
> > radek@gruchalski.com (mailto:radek@gruchalski.com))
> > de.linkedin.com/in/radgruchalski/ (http://de.linkedin.com/in/radgruchalski/) (
> > http://de.linkedin.com/in/radgruchalski/)
> >  
> > Confidentiality:
> > This communication is intended for the above-named person and may be
> > confidential and/or legally privileged.
> > If it has come to you in error you must take no action based on it, nor
> > must you copy or show it to anyone; please delete/destroy and inform the
> > sender immediately.
> >  
> >  
> >  
> > On Monday, 13 July 2015 at 15:41, Scott Thibault wrote:
> >  
> > > We've tried to use Kafka not as a persistent store, but as a long-term
> > > archival store. An outstanding issue we've had with that is that the
> > > broker holds on to an open file handle on every file in the log! The
> > >  
> >  
> > other
> > > issue we've had is when you create a long-term archival log on shared
> > > storage, you can't simply access that data from another cluster b/c of
> > >  
> >  
> > meta
> > > data being stored in zookeeper rather than in the log.
> > >  
> > > --Scott Thibault
> > >  
> > >  
> > > On Mon, Jul 13, 2015 at 4:44 AM, Daniel Schierbeck <
> > > daniel.schierbeck@gmail.com (mailto:daniel.schierbeck@gmail.com)> wrote:
> > >  
> > > > Would it be possible to document how to configure Kafka to never delete
> > > > messages in a topic? It took a good while to figure this out, and I
> > > >  
> > >  
> > >  
> >  
> > see it
> > > > as an important use case for Kafka.
> > > >  
> > > > On Sun, Jul 12, 2015 at 3:02 PM Daniel Schierbeck <
> > > > daniel.schierbeck@gmail.com (mailto:daniel.schierbeck@gmail.com)>
> > > >  
> > >  
> >  
> > wrote:
> > > >  
> > > > >  
> > > > > > On 10. jul. 2015, at 23.03, Jay Kreps <jay@confluent.io (mailto:jay@confluent.io)
(mailto:
> > jay@confluent.io (mailto:jay@confluent.io))> wrote:
> > > > > >  
> > > > > > If I recall correctly, setting log.retention.ms (http://log.retention.ms)
(
> > http://log.retention.ms) and
> > > > log.retention.bytes
> > > > > to
> > > > > > -1 disables both.
> > > > >  
> > > > >  
> > > > >  
> > > > > Thanks!
> > > > >  
> > > > > >  
> > > > > > On Fri, Jul 10, 2015 at 1:55 PM, Daniel Schierbeck <
> > > > > > daniel.schierbeck@gmail.com (mailto:daniel.schierbeck@gmail.com)>
> > > > > >  
> > > > >  
> > > > >  
> > > >  
> > > >  
> > >  
> >  
> > wrote:
> > > > > >  
> > > > > > >  
> > > > > > > > On 10. jul. 2015, at 15.16, Shayne S <shaynest113@gmail.com
(mailto:shaynest113@gmail.com)
> > (mailto:shaynest113@gmail.com)> wrote:
> > > > > > > >  
> > > > > > > > There are two ways you can configure your topics,
log
> > compaction and
> > > > > with
> > > > > > > > no cleaning. The choice depends on your use case.
Are the
> > > > > > >  
> > > > > >  
> > > > >  
> > > > >  
> > > >  
> > >  
> >  
> > records
> > > > > > >  
> > > > > > > uniquely
> > > > > > > > identifiable and will they receive updates? Then log
> > > > > > >  
> > > > > > >  
> > > > > >  
> > > > >  
> > > >  
> > >  
> >  
> > compaction is
> > > > > > >  
> > > > > >  
> > > > >  
> > > >  
> > > >  
> > > > the
> > > > > > > way
> > > > > > > > to go. If they are truly read only, you can go without
log
> > > > > > >  
> > > > > > >  
> > > > > >  
> > > > >  
> > > >  
> > > >  
> > > > compaction.
> > > > > > >  
> > > > > > > I'd rather be free to use the key for partitioning, and
the
> > records
> > > > are
> > > > > > > immutable — they're event records — so disabling compaction
> > > > > >  
> > > > >  
> > > >  
> > > >  
> > >  
> >  
> > altogether
> > > > > > > would be preferable. How is that accomplished?
> > > > > > > >  
> > > > > > > > We have a small processes which consume a topic and
perform
> > upserts
> > > > to
> > > > > > > our
> > > > > > > > various database engines. It's easy to change how
it all works
> > > > > > >  
> > > > > > >  
> > > > > >  
> > > > >  
> > > >  
> > > >  
> > >  
> >  
> > and
> > > > > > >  
> > > > > >  
> > > > >  
> > > > >  
> > > > > simply
> > > > > > > > consume the single source of truth again.
> > > > > > > >  
> > > > > > > > I've written a bit about log compaction here:
> > http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/
> > > > > > > >  
> > > > > > > > On Fri, Jul 10, 2015 at 3:46 AM, Daniel Schierbeck
<
> > > > > > > > daniel.schierbeck@gmail.com (mailto:daniel.schierbeck@gmail.com)
(mailto:
> > > > > > > >  
> > > > > > >  
> > > > > >  
> > > > >  
> > > >  
> > >  
> >  
> > daniel.schierbeck@gmail.com (mailto:daniel.schierbeck@gmail.com))> wrote:
> > > > > > > >  
> > > > > > > > > I'd like to use Kafka as a persistent store –
sort of as an
> > > > > alternative
> > > > > > > to
> > > > > > > > > HDFS. The idea is that I'd load the data into
various other
> > > > > > > >  
> > > > > > >  
> > > > > > >  
> > > > > >  
> > > > >  
> > > > >  
> > > >  
> > >  
> >  
> > systems
> > > > > > > >  
> > > > > > >  
> > > > > >  
> > > > >  
> > > >  
> > > >  
> > > > in
> > > > > > > > > order to solve specific needs such as full-text
search,
> > > > > > > >  
> > > > > > >  
> > > > > >  
> > > > >  
> > > >  
> > > >  
> > >  
> >  
> > analytics,
> > > > > > > >  
> > > > > > >  
> > > > > > >  
> > > > > > > indexing
> > > > > > > > > by various attributes, etc. I'd like to keep
a single source
> > > > > > > >  
> > > > > > >  
> > > > > > >  
> > > > > >  
> > > > >  
> > > >  
> > >  
> >  
> > of
> > > > > > > >  
> > > > > > >  
> > > > > >  
> > > > >  
> > > >  
> > > >  
> > > > truth,
> > > > > > > > > however.
> > > > > > > > >  
> > > > > > > > > I'm struggling a bit to understand how I can
configure a
> > topic to
> > > > > retain
> > > > > > > > > messages indefinitely. I want to make sure that
my data isn't
> > > > > > > >  
> > > > > > >  
> > > > > >  
> > > > >  
> > > > >  
> > > >  
> > > >  
> > > > deleted.
> > > > > > > Is
> > > > > > > > > there a guide to configuring Kafka like this?
> > > > > > > >  
> > > > > > >  
> > > > > > >  
> > > > > >  
> > > > >  
> > > >  
> > > >  
> > >  
> > >  
> > >  
> > >  
> > >  
> > > --
> > > *This e-mail is not encrypted. Due to the unsecured nature of unencrypted
> > > e-mail, there may be some level of risk that the information in this
> > >  
> >  
> > e-mail
> > > could be read by a third party. Accordingly, the recipient(s) named above
> > > are hereby advised to not communicate protected health information using
> > > this e-mail address. If you desire to send protected health information
> > > electronically, please contact MultiScale Health Networks at
> > >  
> >  
> > (206)538-6090*
> > >  
> >  
> >  
>  
>  
>  
> --  
> *This e-mail is not encrypted. Due to the unsecured nature of unencrypted
> e-mail, there may be some level of risk that the information in this e-mail
> could be read by a third party. Accordingly, the recipient(s) named above
> are hereby advised to not communicate protected health information using
> this e-mail address. If you desire to send protected health information
> electronically, please contact MultiScale Health Networks at (206)538-6090*
>  
>  



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message