kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aseem Bansal <asmbans...@gmail.com>
Subject Re: Storing Kafka Message JSON to deep storage like S3
Date Tue, 06 Dec 2016 12:06:40 GMT
@Asaf Mesika Stored to S3?

On Tue, Dec 6, 2016 at 5:28 PM, Asaf Mesika <asaf.mesika@gmail.com> wrote:

> We rolled our own since we couldn't (1.5 years ago) find one. The code is
> quite simple and short.
>
>
> On Tue, Dec 6, 2016 at 1:55 PM Aseem Bansal <asmbansal2@gmail.com> wrote:
>
> > I just meant that is there an existing tool which does that. Basically I
> > tell it "Listen to all X streams and write them to S3/HDFS at Y path as
> > JSON". I know spark streaming can be used and there is flume. But I am
> not
> > sure about their scalability/reliability. That's why I thought to
> initiate
> > a discussion here to see whether someone knows about that already.
> >
> > On Tue, Dec 6, 2016 at 5:14 PM, Sharninder <sharninder@gmail.com> wrote:
> >
> > > What do you mean by streaming way? The logic to push to S3 will be in
> > your
> > > consumer, so it totally depends on how you want to read and store. I
> > think
> > > that's an easier way to do what you want to, instead of trying to
> backup
> > > kafka and then read messages from there. Not even sure that's possible.
> > >
> > > On Tue, Dec 6, 2016 at 5:11 PM, Aseem Bansal <asmbansal2@gmail.com>
> > wrote:
> > >
> > > > I get that we can read them and store them in batches but is there
> some
> > > > streaming way?
> > > >
> > > > On Tue, Dec 6, 2016 at 5:09 PM, Aseem Bansal <asmbansal2@gmail.com>
> > > wrote:
> > > >
> > > > > Because we need to do exploratory data analysis and machine
> learning.
> > > We
> > > > > need to backup the messages somewhere so that the data scientists
> can
> > > > > query/load them.
> > > > >
> > > > > So we need something like a router that just opens up a new
> consumer
> > > > group
> > > > > which just keeps on storing them to S3.
> > > > >
> > > > > On Tue, Dec 6, 2016 at 5:05 PM, Sharninder Khera <
> > sharninder@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > >> Why not just have a parallel consumer read all messages from
> > whichever
> > > > >> topics you're interested in and store them wherever you want
to?
> You
> > > > don't
> > > > >> need to "backup" Kafka messages.
> > > > >>
> > > > >>                 _____________________________
> > > > >> From: Aseem Bansal <asmbansal2@gmail.com>
> > > > >> Sent: Tuesday, December 6, 2016 4:55 PM
> > > > >> Subject: Storing Kafka Message JSON to deep storage like S3
> > > > >> To:  <users@kafka.apache.org>
> > > > >>
> > > > >>
> > > > >> Hi
> > > > >>
> > > > >> Has anyone done a storage of Kafka JSON messages to deep storage
> > like
> > > > S3.
> > > > >> We are looking to back up all of our raw Kafka JSON messages
for
> > > > >> Exploration. S3, HDFS, MongoDB come to mind initially.
> > > > >>
> > > > >> I know that it can be stored in kafka itself but storing them
in
> > Kafka
> > > > >> itself does not seem like a good option as we won't be able to
> query
> > > it
> > > > >> and
> > > > >> the configurations of machines containing kafka will have to
be
> > > > increased
> > > > >> as we go. Something like S3 we won't have to manage.
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --
> > > Sharninder
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message