spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: Spark structured streaming: Is it possible to periodically refresh static data frame?
Date Thu, 20 Apr 2017 21:03:23 GMT
Here are couple of ideas.
1. You can set up a Structured Streaming query to update in-memory table.
Look at the memory sink in the programming guide - http://spark.apache.org/
docs/latest/structured-streaming-programming-guide.html#output-sinks
So you can query the latest table using a specified table name, and also
join that table with another stream. However, note that this in-memory
table is maintained in the driver, and so you have be careful about the
size of the table.

2. If you cannot define a streaming query in the slow moving due to
unavailability of connector for your streaming data source, then you can
always define a batch Dataframe and register it as view, and then run a
background then periodically creates a new Dataframe with updated data and
re-registers it as a view with the same name. Any streaming query that
joins a streaming dataframe with the view will automatically start using
the most updated data as soon as the view is updated.

Hope this helps.


On Thu, Apr 20, 2017 at 1:30 PM, Hemanth Gudela <hemanth.gudela@qvantel.com>
wrote:

> Thanks Georg for your reply.
>
> But I’m not sure if I fully understood your answer.
>
>
>
> If you meant to join two streams (one reading Kafka, and another reading
> database table), then I think it’s not possible, because
>
> 1.       According to documentation
> <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#data-sources>,
> Structured streaming does not support database as a streaming source
>
> 2.       Joining between two streams is not possible yet.
>
>
>
> Regards,
>
> Hemanth
>
>
>
> *From: *Georg Heiler <georg.kf.heiler@gmail.com>
> *Date: *Thursday, 20 April 2017 at 23.11
> *To: *Hemanth Gudela <hemanth.gudela@qvantel.com>, "user@spark.apache.org"
> <user@spark.apache.org>
> *Subject: *Re: Spark structured streaming: Is it possible to periodically
> refresh static data frame?
>
>
>
> What about treating the static data as a (slow) stream as well?
>
>
>
> Hemanth Gudela <hemanth.gudela@qvantel.com> schrieb am Do., 20. Apr. 2017
> um 22:09 Uhr:
>
> Hello,
>
>
>
> I am working on a use case where there is a need to join streaming data
> frame with a static data frame.
>
> The streaming data frame continuously gets data from Kafka topics, whereas
> static data frame fetches data from a database table.
>
>
>
> However, as the underlying database table is getting updated often, I must
> somehow manage to refresh my static data frame periodically to get the
> latest information from underlying database table.
>
>
>
> My questions:
>
> 1.       Is it possible to periodically refresh static data frame?
>
> 2.       If refreshing static data frame is not possible, is there a
> mechanism to automatically stop & restarting spark structured streaming
> job, so that every time the job restarts, the static data frame gets
> updated with latest information from underlying database table.
>
> 3.       If 1) and 2) are not possible, please suggest alternatives to
> achieve my requirement described above.
>
>
>
> Thanks,
>
> Hemanth
>
>

Mime
View raw message