spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Assudani <>
Subject Update Batch DF with Streaming
Date Thu, 16 Jun 2016 22:11:45 GMT
Hi All,

Can I update batch data frames loaded in memory with Streaming data,

For eg,

I have employee DF is registered as temporary table, it has EmployeeID, Name, Address, etc.
fields,  and assuming it is very big and takes time to load in memory,

I've two types of employee events (both having empID bundled in payload) coming in streams,

1) which looks up  for a particular empID in batch data and does some calculation and persist
the results,

2) which has updated values of some of the fields for an empID,

Now I want to keep the employee DF up to date with the updates coming in type 2 events for
future type 1 events to use,

Now the question is can I update the employee DF with type 2 events in memory ? Do I need
the whole DF refresh ?

p.s. I can join the stream with batch and get the joined table, but i am not sure how to get
and use the handle of joined data for subsequent events,




NOTE: This message may contain information that is confidential, proprietary, privileged or
otherwise protected by law. The message is intended solely for the named addressee. If received
in error, please destroy and notify the sender. Any use of this email is prohibited when received
in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this
communication has been maintained nor that the communication is free of errors, virus, interception
or interference.

View raw message