spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vikram agrawal <>
Subject Structured Streaming Support for Amazon Kinesis in Spark 2.2.x
Date Tue, 13 Mar 2018 10:08:44 GMT
Hi All,

I have implemented Kinesis Connector for Structured Streaming. The code is
available is at

Open Source Jira for the same is SPARK-18165

Design details are mentioned here -

I wrote and ran unit tests to validate the implementation. Apart from that,
I started 3 node spark clusters to read from Kinesis and write to S3 in
parquet format and ran the query for more than a day.

List of Features implemented

   - Re-sharding logic to add/remove new/closed shards
   - Executors fetch records from Kinesis as part of the incremental job
   execution (instead of having a receiver model where few threads are
   responsible for reading from Kinesis)
   - Various configuration to have fine-grain control depending upon your

I have developed and tested the connector against Spark 2.2.x and
will migrate it to DataSource V2 APIs in some time.

Can anyone help me in reviewing the design/implementation? I would love it
to be part of Spark Distribution.

- Vikram

View raw message