spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jose Fernandez <>
Subject Handling worker batch processing during driver shutdown
Date Thu, 12 Mar 2015 19:27:54 GMT
Hi folks,

I have a shutdown hook in my driver which stops the streaming context cleanly. This is great
as workers can finish their current processing unit before shutting down. Unfortunately each
worker contains a batch processor which only flushes every X entries. We’re indexing to
different indices in elasticsearch and using the bulk index request for performance. As far
as Spark is concerned, once data is added to the batcher it is considered processed, so our
workers are being shut down with data still in the batcher.

Is there any way to coordinate the shutdown with the workers? I haven’t had any luck searching
for a solution online. I would appreciate any suggestions you may have.

Thanks :)

 [] <>

SDL PLC confidential, all rights reserved. If you are not the intended recipient of this mail
SDL requests and requires that you delete it without acting upon or copying any of its contents,
and we further request that you advise us.

SDL PLC is a public limited company registered in England and Wales. Registered number: 02675207.
Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire SL6 7DY, UK.

This message has been scanned for malware by Websense.
View raw message