kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <jun...@gmail.com>
Subject Re: Reading Kafka directly from Pig?
Date Wed, 07 Aug 2013 14:49:22 GMT
David,

That's interesting. Kafka provides an infinite stream of data whereas Pig
works on a finite amount of data. How did you solve the mismatch?

Thanks,

Jun


On Wed, Aug 7, 2013 at 7:41 AM, David Arthur <mumrah@gmail.com> wrote:

> I've thrown together a Pig LoadFunc to read data from Kafka, so you could
> load data like:
>
> QUERY_LOGS = load 'kafka://localhost:9092/logs.**query#8' using
> com.mycompany.pig.**KafkaAvroLoader('com.**mycompany.Query');
>
> The path part of the uri is the Kafka topic, and the fragment is the
> number of partitions. In the implementation I have, it makes one input
> split per partition. Offsets are not really dealt with at this point - it's
> a rough prototype.
>
> Anyone have thoughts on whether or not this is a good idea? I know usually
> the pattern is: kafka -> hdfs -> mapreduce. If I'm only reading from this
> data from Kafka once, is there any reason why I can't skip writing to HDFS?
>
> Thanks!
> -David
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message