samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nirmal Kumar <nirmal.ku...@impetus.co.in>
Subject RE: Writing a simple KafkaProducer in Samza
Date Mon, 21 Oct 2013 07:33:33 GMT
Hi Chris,

Thanks a lot for the information.

I am comparing Storm + Kafka0.8  vs  Samza.

As part of the initial use case I am using Kafka API to publish messages to the Kafka Broker
say 40k, 50k...
Then using the Storm spout I am consuming these messages.
On the Samza side I am using a similar Kafka Consumer that consumes messages from Kafka.

Is this use case fine for comparing Storm + Kafka0.8  vs  Samza ? or do you think any other
things that I need to consider?

Also going forward I am planning to find some use cases where I can compare Samza's  features
like State Management, Partitioning and Parallelism, etc.
Any pointers towards these specific use cases?

Thanks,
-Nirmal

-----Original Message-----
From: Chris Riccomini [mailto:criccomini@linkedin.com]
Sent: Thursday, October 17, 2013 10:42 PM
To: dev@samza.incubator.apache.org
Subject: Re: Writing a simple KafkaProducer in Samza

Hey Nirmal,

Glad to hear that the hello-samza project is working for you.

If I understand you correctly, I believe that you're saying you want to send messages to a
Kafka topic, right? As you've pointed out, you can send messages to Kafka through Samza. You
can also send messages to Kafka directly using the Kafka producer API. Which one you use depends
on what you're code is doing.

With Samza, typically a task is sending messages to Kafka as a reaction to some other event
(which triggers the process method). For example, in the Wiki example, we send a message to
Kafka whenever an update happens on Wikipedia (via the IRC channel). In this example, we had
to write a Wikipedia consumer for Samza, which implements the SystemConsumer API in Samza.
This implementation reads messages from the Wikimedia IRC channel.
You can see this implementation here:


https://github.com/linkedin/hello-samza/blob/master/samza-wikipedia/src/mai
n/java/samza/examples/wikipedia/system/WikipediaConsumer.java

If you use Samza, you MUST have at least one task.input defined, which feeds messages to your
StreamTask. Out of the box, Samza comes with a KafkaSystemConsumer implementation. The hello-samza
project comes with the WikipediaSystemConsumer implementation. If you want to react to messages
from another system, or feed, you'd have to implement this interface (and hopefully contribute
it back :).

The alternative approach would be to just send messages directly to Kafka using the Kafka
API. This approach is more appropriate in cases that don't fit well with Samza's processing
model (e.g. you can't easily implement the SystemConsumer API, you need to guarantee deployment
on a specific host all the time, etc). For example, if you wanted to read syslog messages
on a specific host, and send them to Kafka, it probably makes more sense to just write a simple
Java main() method that creates a Kafka producer, polls syslog periodically, and calls producer.send()
whenever a new message appears in the syslog.

If you can be more specific about what you're doing, I can probably provide better advice.

Cheers,
Chris

On 10/17/13 6:39 AM, "Nirmal Kumar" <nirmal.kumar@impetus.co.in> wrote:

>Hi All,
>
>I was referring the hello-samza project as was able to run it
>successfully.
>I was able to run all the jobs and also wrote a consumer task to listen
>to kafka.wikipedia-stats topic.
>
>I now want to write a Samza job that act as a KafkaProducer to
>continuously publishes simple string messages to a topic.
>Just like the WikipediaFeedStreamTask that reads Wikipedia events and
>publishes them to a topic.
>I am not sure of the any value of task.inputs in the config properties
>file?
>The way I think is like a java program publishing string messages to a
>kafka topic.
>How can I write such a Samza job?
>
>Any pointers would be of great help.
>
>Later on I want can read the same messages from a consumer like
>WikipediaParserStreamTask does.
>Referring the hello-samza project I was able to write a Consumer task
>that reads messages from the topic(kafka.wikipedia-stats) by simply
>task.class=samza.examples.wikipedia.task.TestConsumer
>task.inputs=kafka.wikipedia-stats
>
>
>Thanks,
>-Nirmal
>
>________________________________
>
>
>
>
>
>
>NOTE: This message may contain information that is confidential,
>proprietary, privileged or otherwise protected by law. The message is
>intended solely for the named addressee. If received in error, please
>destroy and notify the sender. Any use of this email is prohibited when
>received in error. Impetus does not represent, warrant and/or
>guarantee, that the integrity of this communication has been maintained
>nor that the communication is free of errors, virus, interception or interference.


________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or
otherwise protected by law. The message is intended solely for the named addressee. If received
in error, please destroy and notify the sender. Any use of this email is prohibited when received
in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this
communication has been maintained nor that the communication is free of errors, virus, interception
or interference.

Mime
View raw message