spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Appu K <kut...@gmail.com>
Subject Re: Closing resources in the executor
Date Thu, 02 Feb 2017 10:07:28 GMT
https://mid.mail-archive.com/search?l=user@spark.apache.org&q=subject:%22Executor+shutdown+hook+and+initialization%22&o=newest&f=1

I see this thread where it is mentioned that per-partition resource
management is recommended over global state(within an executor)
What would be the way to achieve this in data-frames

Is shutdown hook the only solution right now ?

thanks
sajith


On 2 February 2017 at 11:58:27 AM, Appu K (kutt4n@gmail.com) wrote:



What would be the recommended way to close resources opened or shared by
executors?

A few use cases

#1) Let's say the enrichment process needs to convert ip / lat+long to
city/country. To achieve this, executors could open a file in the hdfs and
build a map or use a memory mapped file  - the implementation could be a
transient lazy val singleton or something similar .  Now, the udf defined
would perform lookups on these data structures and return geo data.

#2) Let's say there is a need to do a lookup on a KV store like redis from
the executor. Each executor would create a connection pool and provide
connections for tasks running in them to perform lookups.

In scenarios, like this when the executor is closed, what would be the best
way to close the open resources ( streams etc)


Any pointers to places where i could read up a bit more about the best
practices around it would be highly appreciated!

thanks
appu

Mime
View raw message