spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeoffrey Lim <jeoffr...@gmail.com>
Subject Re: Maelstrom: Kafka integration with Spark
Date Wed, 24 Aug 2016 03:18:17 GMT
Apologies, I was not aware that Spark 2.0 has Kafka Consumer
caching/pooling now.
What I have checked is the latest Kafka Consumer, and I believe it is still
in beta quality.

https://kafka.apache.org/documentation.html#newconsumerconfigs

> Since 0.9.0.0 we have been working on a replacement for our existing
simple and high-level consumers.
> The code is considered beta quality.

Not sure about this, does Spark 2.0 Kafka 0.10 integration already uses
this one? Is it now stable?
With this caching feature in Spark 2,.0 could it achieve sub-milliseconds
stream processing now?


Maelstrom still uses the old Kafka Simple Consumer, this library was made
open source so that I
could continue working on it for future updates & improvements like when
the latest Kafka Consumer
gets a stable release.

We have been using Maelstrom "caching concept" for a long time now, as
Receiver based Spark Kafka integration
does not work for us. There were thoughts about using Direct Kafka APIs,
however Maelstrom has
very simple APIs and just "simply works" even under unstable scenarios
(e.g. advertised hostname failures on EMR).

Maelstrom will work I believe even for Spark 1.3 and Kafka 0.8.2.1 (and of
course with the latest Kafka 0.10 as well)


On Wed, Aug 24, 2016 at 9:49 AM, Cody Koeninger <cody@koeninger.org> wrote:

> Were you aware that the spark 2.0 / kafka 0.10 integration also reuses
> kafka consumer instances on the executors?
>
> On Tue, Aug 23, 2016 at 3:19 PM, Jeoffrey Lim <jeoffreyl@gmail.com> wrote:
> > Hi,
> >
> > I have released the first version of a new Kafka integration with Spark
> > that we use in the company I work for: open sourced and named Maelstrom.
> >
> > It is unique compared to other solutions out there as it reuses the
> > Kafka Consumer connection to achieve sub-milliseconds latency.
> >
> > This library has been running stable in production environment and has
> > been proven to be resilient to numerous production issues.
> >
> >
> > Please check out the project's page in github:
> >
> > https://github.com/jeoffreylim/maelstrom
> >
> >
> > Contributors welcome!
> >
> >
> > Cheers!
> >
> > Jeoffrey Lim
> >
> >
> > P.S. I am also looking for a job opportunity, please look me up at
> Linked In
>

Mime
View raw message