kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neha Narkhede <neha.narkh...@gmail.com>
Subject Re: Kafka in AWS?
Date Wed, 21 Mar 2012 15:09:48 GMT
Vaibhav,

Thanks for explaining your use case. I think I see the requirement
here. It seems like you need the data in S3 since you use Elastic
MapReduce to process your data. I guess that's the reason the Hadoop
input/output formats that Kafka provides are not directly useful.

I have some ideas on how this can be done. Will write them up on a wiki soon.

Thanks,
Neha

On Tue, Mar 20, 2012 at 10:21 PM, Vaibhav Puranik <vpuranik@gmail.com> wrote:
> Neha,
>
> My requirement is not related to Russell's, but I thought it will be
> helpful describe what we need at GumGum <http://gumgum.com/>.
> I wasn't sure whether it's Kafka domain since kafka gives you a topic
> to pull  data from and then it's up to you to do whatever with it.
>
> But since we are talking about it, here is what we do everyday (currently
> without Kafka):
>
> We are a ad network. We write all of our impressions and clicks data in
> various log files and upload it to S3. At night we run many Map reduce jobs
> to aggregate this data in various ways.
> We have an 'Autoscaled' cluster in AWS. Our webservers keep going up and
> down based on the load on the system.
>
> Whenever a server shuts down we tend to lose data. Many times file upload
> is not completed in time before the server shuts down. That is why we are
> looking at implementing Kafka to send events in real time to S3 without
> losing them.
>
> If there exists a 'sink' that transfers data to S3, our job will be lot
> easier. But again, I am not sure whether Kafka is supposed to provide that
> or not.
>
> Regards,
> Vaibhav
>
>
> On Tue, Mar 20, 2012 at 10:03 PM, Neha Narkhede <neha.narkhede@gmail.com>wrote:
>
>> Russell,
>>
>> By "sink events into S3", do you mean you want to have some plugin that
>> will suck data out of your Kafka brokers and upload to S3. Would you mind
>> describing use cases that would require to send data to Kafka, then upload
>> data to S3, and then use it by querying S3 ?
>>
>> Thanks,
>> Neha
>> On Mar 20, 2012 4:51 PM, "Russell Jurney" <russell.jurney@gmail.com>
>> wrote:
>>
>> > I think as soon as someone commits code that reliably sinks events to S3,
>> > Kafka adoption will skyrocket.  There is no good solution to this yet.
>> >  MANY people want one.
>> >
>> > Russ
>> >
>> > On Tue, Mar 20, 2012 at 3:32 PM, Felix GV <felix@mate1inc.com> wrote:
>> >
>> > > The primary use case for Kafka is to use it on AWS...???
>> > >
>> > > Sorry if I put words you didn't intend in your mouth :P ... I just
>> > thought
>> > > that sounded funny ;)
>> > >
>> > > Sorry for being off-topic. Carry on :/ !
>> > >
>> > > --
>> > > Felix
>> > >
>> > >
>> > >
>> > > On Tue, Mar 20, 2012 at 6:23 PM, Russell Jurney <
>> > russell.jurney@gmail.com
>> > > >wrote:
>> > >
>> > > > Yeah, that is the part I am hoping someone will contribute :)  I
>> know I
>> > > can
>> > > > write that myself.  I also know it will be buggy and that I will
have
>> > > lots
>> > > > of trouble.
>> > > >
>> > > > If you contribute this code, it would be a huge boon to Kafka.  It
is
>> > imo
>> > > > the primary use case for Kafka atm... if only the code gets into git.
>> > > >
>> > > > On Tue, Mar 20, 2012 at 3:04 PM, Niek Sanders <
>> niek.sanders@gmail.com
>> > > > >wrote:
>> > > >
>> > > > > Russell,
>> > > > >
>> > > > > I'm actually in the process of writing a Java code to go from
Kafka
>> > > > > messages to S3.  I might be able to rip-out my application-specific
>> > > > > parts and share something later tonight.
>> > > > >
>> > > > > The biggest hassle is that you can't append to existing S3 files.
>>  So
>> > > > > unless you're planning on uploading each message as a separate
S3
>> > > > > object, this means you need message aggregation smarts on the
Kafka
>> > > > > consumer / S3 uploader side of things.
>> > > > >
>> > > > > Best,
>> > > > > Niek
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Tue, Mar 20, 2012 at 12:56 PM, Russell Jurney
>> > > > > <russell.jurney@gmail.com> wrote:
>> > > > > > I wish someone would publish some source that writes events
to
>> S3.
>> > > > > >
>> > > > > > Russell Jurney
>> > > > > > twitter.com/rjurney
>> > > > > > russell.jurney@gmail.com
>> > > > > > datasyndrome.com
>> > > > > >
>> > > > > > On Mar 20, 2012, at 11:20 AM, Dave Fayram <dfayram@gmail.com>
>> > wrote:
>> > > > > >
>> > > > > >> We've been successfully using Kafka on AWS as well,
and JMX wise
>> > we
>> > > > > >> just use an SSH tunnel.
>> > > > > >>
>> > > > > >> In general, we've been very happy with the performance
on AWS,
>> > which
>> > > > > >> some people have reservations about due to the I/O situation
on
>> > most
>> > > > > >> Amazon boxes.
>> > > > > >>
>> > > > > >> On Tue, Mar 20, 2012 at 9:07 AM, Gautam Singaraju
>> > > > > >> <gautam.singaraju@gmail.com> wrote:
>> > > > > >>> We are have been considering Kafka for a new Data
Platform. Has
>> > > > someone
>> > > > > >>> used Kafka in AWS? If so, could you please share
your
>> experiences
>> > > > with
>> > > > > us?
>> > > > > >>>
>> > > > > >>> Thank you!
>> > > > > >>> ---
>> > > > > >>> Gautam
>> > > > > >>
>> > > > > >>
>> > > > > >>
>> > > > > >> --
>> > > > > >> --
>> > > > > >> Dave Fayram
>> > > > > >> dfayram@gmail.com
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>> > > > datasyndrome.com
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>> > datasyndrome.com
>> >
>>

Mime
View raw message