spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ☼ R Nair (रविशंकर नायर) <ravishankar.n...@gmail.com>
Subject Re: Dataframe caching
Date Fri, 20 Jan 2017 23:32:25 GMT
Thanks, Will look into this.

Best regards,
Ravion
---------- Forwarded message ----------
From: "Muthu Jayakumar" <babloo80@gmail.com>
Date: Jan 20, 2017 10:56 AM
Subject: Re: Dataframe caching
To: "☼ R Nair (रविशंकर नायर)" <ravishankar.nair@gmail.com>
Cc: "user@spark.apache.org" <user@spark.apache.org>

I guess, this may help in your case?

https://spark.apache.org/docs/latest/sql-programming-guide.h
tml#global-temporary-view

Thanks,
Muthu

On Fri, Jan 20, 2017 at 6:27 AM, ☼ R Nair (रविशंकर नायर) <
ravishankar.nair@gmail.com> wrote:

> Dear all,
>
> Here is a requirement I am thinking of implementing in Spark core. Please
> let me know if this is possible, and kindly provide your thoughts.
>
> A user executes a query to fetch 1 million records from , let's say a
> database. We let the user store this as a  dataframe, partitioned across
> the cluster.
>
> Another user , executed the same query from another session. Is there
> anyway that we can let the second user reuse the dataframe created by the
> first user?
>
> Can we have a master dataframe (or RDD) which stores the information about
> the current dataframes loaded and matches against any queries that are
> coming from other users?
>
> In this way, we will have a wonderful system which never allows same query
> to be executed and loaded again into the cluster memory.
>
> Best, Ravion
>

Mime
View raw message