spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <>
Subject Re: Efficient filtering on Spark SQL dataframes with ordered keys
Date Tue, 01 Nov 2016 11:11:12 GMT
it would be great if we establish this.

I know in Hive these temporary tables "CREATE TEMPRARY TABLE ..." are
private to the session and are put in a hidden staging directory as below


and removed when the session ends or table is dropped

Not sure how Spark handles this.


Dr Mich Talebzadeh

LinkedIn *

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 1 November 2016 at 10:50, Michael David Pedersen <> wrote:

> Thanks for the link, I hadn't come across this.
> According to
>> erence-between-registertemptable-a.html
>> and I quote
>> "registerTempTable()
>> registerTempTable() creates an in-memory table that is scoped to the
>> cluster in which it was created. The data is stored using Hive's
>> highly-optimized, in-memory columnar format."
> But then the last post in the thread corrects this, saying:
> "registerTempTable does not create a 'cached' in-memory table, but rather
> an alias or a reference to the DataFrame. It's akin to a pointer in C/C++
> or a reference in Java".
> So - probably need to dig into the sources to get more clarity on this.
> Cheers,
> Michael

View raw message