I've asked the following question  on Stackoverflow but didn't
get an answer, yet. I use now this channel to give it more
visibility, and hopefully find someone who can help.
"Context. I have tens of SQL queries
stored in separate files. For benchmarking purposes, I created an
application that iterates through each of those query files and
passes it to a standalone Spark application. This latter first
parses the query, extracts the used tables, registers them (using:
registerTempTable() in Spark < 2 and createOrReplaceTempView()
in Spark 2), and executes effectively the query (spark.sql()).
Challenge. Since registering the tables can sometimes be
time consuming, I would like to register the tables only once when
they are first used, and keep that in form of metadata that can
readily be used in the subsequent queries without the need to
re-register the tables again. It's a sort of intra-job caching but
not any of the caching Spark offers (table caching), as far as I
Is that possible? if not can anyone suggest another approach to
accomplish the same goal (i.e., iterating through separate query
files and run a querying Spark application without registering the
tables that have already been registered before)."