spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Efficient filtering on Spark SQL dataframes with ordered keys
Date Tue, 01 Nov 2016 10:38:57 GMT
A bit of gray area here I am afraid, I was trying to experiment with it

According to
https://forums.databricks.com/questions/400/what-is-the-difference-between-registertemptable-a.html

and I quote

"registerTempTable()

registerTempTable() creates an in-memory table that is scoped to the
cluster in which it was created. The data is stored using Hive's
highly-optimized, in-memory columnar format."


So on the face of it tempTable is an in-memory table

HTH




Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 1 November 2016 at 10:01, Michael David Pedersen <
michael.d.pedersen@googlemail.com> wrote:

> Hi again Mich,
>
> "But the thing is that I don't explicitly cache the tempTables ..".
>>
>> I believe tempTable is created in-memory and is already cached
>>
>
> That surprises me since there is a sqlContext.cacheTable method to
> explicitly cache a table in memory. Or am I missing something? This could
> explain why I'm seeing somewhat worse performance than I'd expect.
>
> Cheers,
> Michael
>

Mime
View raw message