kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neha Narkhede <neha.narkh...@gmail.com>
Subject Re: Kafka and Hadoop
Date Thu, 11 Aug 2011 16:44:15 GMT
Paul,

>> and to pull data down to our Hadoop cluster (and ultimately into some
Hive tables) for doing some offline analysis also.

One way of doing this is to setup a Kafka cluster co-located with your
Hadoop cluster and configure it to be a "mirror" of your primary Kafka
cluster. Please see
KAFKA-74<https://issues.apache.org/jira/browse/KAFKA-74>that will
automate the mirroring of new topics that appear in your primary
Kafka cluster.

>> My current thinking would be to use something to replicate the topic
offsets onto S3 periodically and then to run distcp to periodically copy
them onto HDFS?

If you setup the mirror Kafka cluster in the above way, you can store your
topic offsets (used by the mappers) in HDFS itself and avoid any copying.

Thanks,
Neha

On Thu, Aug 11, 2011 at 4:20 AM, Paul Ingles <paul@forward.co.uk> wrote:

> Hi,
>
> I'm evaluating using Kafka to aggregate some web logs and additional
> activity tracking for one of our projects. I'd like to know a little more
> about the best way to stitch things together.
>
> The application runs across EC2 and some internal hardware. We also run a
> Hadoop cluster inside our office. I'd like to use Kafka to help aggregate
> activity together, augment it with something like Esper to do some systems
> monitoring work, and to pull data down to our Hadoop cluster (and ultimately
> into some Hive tables) for doing some offline analysis also.
>
> I notice in the hadoop-consumer README (
> https://github.com/kafka-dev/kafka/tree/master/contrib/hadoop-consumer)
> it's necessary to provide the HDFS location of the input files.
>
> I was wondering whether people had recommendations on good ways to pull
> data onto HDFS? My current thinking would be to use something to replicate
> the topic offsets onto S3 periodically and then to run distcp to
> periodically copy them onto HDFS?
>
> Thanks for any tips,
> Paul

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message