spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From roshan joe <impdocs2...@gmail.com>
Subject share datasets across multiple spark-streaming applications for lookup
Date Tue, 31 Oct 2017 02:53:44 GMT
Hi,

What is the recommended way to share datasets across multiple
spark-streaming applications, so that the incoming data can be looked up
against this shared dataset?

The shared dataset is also incrementally refreshed and stored on S3. Below
is the scenario.

Streaming App-1 consumes data from Source-1 and writes to DS-1 in S3.
Streaming App-2 consumes data from Source-2 and writes to DS-2 in S3.


Streaming App-3 consumes data from Source-3, *needs to lookup against DS-1
and DS-2* and write to DS-3 in S3.
Streaming App-4 consumes data from Source-4, *needs to lookup against DS-1
and DS-2 *and write to DS-3 in S3.
Streaming App-n consumes data from Source-n, *needs to lookup against DS-1
and DS-2 *and write to DS-n in S3.

So DS-1 and DS-2 ideally should be shared for lookup across multiple
streaming apps. Any input is appreciated. Thank you!

Mime
View raw message