spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars Albertsson <>
Subject Re: How to spin up Kafka using docker and use for Spark Streaming Integration tests
Date Mon, 04 Jul 2016 12:20:51 GMT
I created such a setup for a client a few months ago. It is pretty
straightforward, but it can take some work to get all the wires

I suggest that you start with the spotify/kafka
( Docker image, since it
includes a bundled zookeeper. The alternative would be to spin up a
separate Zookeeper Docker container and connect them, but for testing
purposes, it would make the setup more complex.

You'll need to inform Kafka about the external address it exposes by
setting ADVERTISED_HOST to the output of "docker-machine ip" (on Mac)
or the address printed by "ip addr show docker0" (Linux). I also
suggest setting

You can choose to run your Spark Streaming application under test
(SUT) and your test harness also in Docker containers, or directly on
your host.

In the former case, it is easiest to set up a Docker Compose file
linking the harness and SUT to Kafka. This variant provides better
isolation, and might integrate better if you have existing similar
test frameworks.

If you want to run the harness and SUT outside Docker, I suggest that
you build your harness with a standard test framework, e.g. scalatest
or JUnit, and run both harness and SUT in the same JVM. In this case,
you put code to bring up the Kafka Docker container in test framework
setup methods. This test strategy integrates better with IDEs and
build tools (mvn/sbt/gradle), since they will run (and debug) your
tests without any special integration. I therefore prefer this

What is the output of your application? If it is messages on a
different Kafka topic, the test harness can merely subscribe and
verify output. If you emit output to a database, you'll need another
Docker container, integrated with Docker Compose. If you are emitting
database entries, your test oracle will need to frequently poll the
database for the expected records, with a timeout in order not to hang
on failing tests.

I hope this is comprehensible. Let me know if you have followup questions.


Lars Albertsson
Data engineering consultant
+46 70 7687109

On Thu, Jun 30, 2016 at 8:19 PM, SRK <> wrote:
> Hi,
> I need to do integration tests using Spark Streaming. My idea is to spin up
> kafka using docker locally and use it to feed the stream to my Streaming
> Job. Any suggestions on how to do this would be of great help.
> Thanks,
> Swetha
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

To unsubscribe e-mail:

View raw message