kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Arthur <mum...@gmail.com>
Subject Reading Kafka directly from Pig?
Date Wed, 07 Aug 2013 14:41:30 GMT
I've thrown together a Pig LoadFunc to read data from Kafka, so you 
could load data like:

QUERY_LOGS = load 'kafka://localhost:9092/logs.query#8' using 
com.mycompany.pig.KafkaAvroLoader('com.mycompany.Query');

The path part of the uri is the Kafka topic, and the fragment is the 
number of partitions. In the implementation I have, it makes one input 
split per partition. Offsets are not really dealt with at this point - 
it's a rough prototype.

Anyone have thoughts on whether or not this is a good idea? I know 
usually the pattern is: kafka -> hdfs -> mapreduce. If I'm only reading 
from this data from Kafka once, is there any reason why I can't skip 
writing to HDFS?

Thanks!
-David

Mime
View raw message