kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denny Lee <denny.g....@gmail.com>
Subject Re: Experiences with larger message sizes
Date Tue, 24 Jun 2014 16:37:55 GMT
Hey Joe,

Yes, I have - my original plan is to do something similar to what you suggested which was
to simply push the data into HDFS / S3 and then having only the event information within Kafka
so that way multiple consumers can just read the event information and ping HDFS/S3 for the
actual message itself.  

Part of the reason for considering just pushing the entire message up is due to the potential
where we will have a firehose of messages of this size and we will need to push this data
to multiple locations.


On June 24, 2014 at 9:26:49 AM, Joe Stein (joe.stein@stealth.ly) wrote:

Hi Denny, have you considered saving those files to HDFS and sending the  
"event" information to Kafka?  

You could then pass that off to Apache Spark in a consumer and get data  
locality for the file saved (or something of the sort [no pun intended]).  

You could also stream every line (or however you want to "chunk" it) in the  
file as a separate message to the broker with a wrapping message object (so  
you know which file you are dealing with when consuming).  

What you plan to-do with the data has a lot to-do with how you are going to  
process and manage it.  

Joe Stein  
Founder, Principal Consultant  
Big Data Open Source Security LLC  
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>  

On Tue, Jun 24, 2014 at 11:35 AM, Denny Lee <denny.g.lee@gmail.com> wrote:  

> By any chance has anyone worked with using Kafka with message sizes that  
> are approximately 50MB in size? Based on from some of the previous threads  
> there are probably some concerns on memory pressure due to the compression  
> on the broker and decompression on the consumer and a best practices on  
> ensuring batch size (to ultimately not have the compressed message exceed  
> message size limit).  
> Any other best practices or thoughts concerning this scenario?  
> Thanks!  
> Denny  

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message