spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From swetha kasireddy <swethakasire...@gmail.com>
Subject Re: How to spin up Kafka using docker and use for Spark Streaming Integration tests
Date Thu, 07 Jul 2016 00:02:25 GMT
The application output is that it inserts data to cassandra at the end of
every batch.

On Mon, Jul 4, 2016 at 5:20 AM, Lars Albertsson <lalle@mapflat.com> wrote:

> I created such a setup for a client a few months ago. It is pretty
> straightforward, but it can take some work to get all the wires
> connected.
>
> I suggest that you start with the spotify/kafka
> (https://github.com/spotify/docker-kafka) Docker image, since it
> includes a bundled zookeeper. The alternative would be to spin up a
> separate Zookeeper Docker container and connect them, but for testing
> purposes, it would make the setup more complex.
>
> You'll need to inform Kafka about the external address it exposes by
> setting ADVERTISED_HOST to the output of "docker-machine ip" (on Mac)
> or the address printed by "ip addr show docker0" (Linux). I also
> suggest setting
> AUTO_CREATE_TOPICS to true.
>
> You can choose to run your Spark Streaming application under test
> (SUT) and your test harness also in Docker containers, or directly on
> your host.
>
> In the former case, it is easiest to set up a Docker Compose file
> linking the harness and SUT to Kafka. This variant provides better
> isolation, and might integrate better if you have existing similar
> test frameworks.
>
> If you want to run the harness and SUT outside Docker, I suggest that
> you build your harness with a standard test framework, e.g. scalatest
> or JUnit, and run both harness and SUT in the same JVM. In this case,
> you put code to bring up the Kafka Docker container in test framework
> setup methods. This test strategy integrates better with IDEs and
> build tools (mvn/sbt/gradle), since they will run (and debug) your
> tests without any special integration. I therefore prefer this
> strategy.
>
>
> What is the output of your application? If it is messages on a
> different Kafka topic, the test harness can merely subscribe and
> verify output. If you emit output to a database, you'll need another
> Docker container, integrated with Docker Compose. If you are emitting
> database entries, your test oracle will need to frequently poll the
> database for the expected records, with a timeout in order not to hang
> on failing tests.
>
> I hope this is comprehensible. Let me know if you have followup questions.
>
> Regards,
>
>
>
> Lars Albertsson
> Data engineering consultant
> www.mapflat.com
> +46 70 7687109
> Calendar: https://goo.gl/6FBtlS
>
>
>
> On Thu, Jun 30, 2016 at 8:19 PM, SRK <swethakasireddy@gmail.com> wrote:
> > Hi,
> >
> > I need to do integration tests using Spark Streaming. My idea is to spin
> up
> > kafka using docker locally and use it to feed the stream to my Streaming
> > Job. Any suggestions on how to do this would be of great help.
> >
> > Thanks,
> > Swetha
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-spin-up-Kafka-using-docker-and-use-for-Spark-Streaming-Integration-tests-tp27252.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >
>

Mime
View raw message