spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bin Fan <fanbin...@gmail.com>
Subject Re: [Spark RDD] Persisting Spark RDDs across spark contexts/applications - options
Date Thu, 04 Jun 2020 17:42:38 GMT
Hi Boris,

This is actually why Alluxio (by-then Tachyon) was created initially in
AMPLab.
Checkout the documentation
https://docs.alluxio.io/os/user/stable/en/compute/Spark.html on persisting
RDD/Dataframes to Alluxio.

some example
https://www.alluxio.io/resources/case-studies/making-the-impossible-possible-with-alluxio-accelerate-spark-jobs-from-hours-to-seconds/
https://www.alluxio.io/blog/tencent-case-study-delivering-customized-news-to-over-100-million-users-per-month-with-alluxio/
<https://www.alluxio.io/resources/case-studies/making-the-impossible-possible-with-alluxio-accelerate-spark-jobs-from-hours-to-seconds/>
Happy to provide you more info

- Bin

On Thu, Jun 4, 2020 at 12:26 AM Boris Litvak <boris.litvak@skf.com> wrote:

> I would like to cache Apache Spark RDDs and share them between Spark
> applications.
>
> Alluxio (Tachyon), Redis & Ignite all offer such capabilities.
>
> For instance, see Ignite's proposal:
>
> Are there any comparison studies on performance/maintenance
> burden/installation experience of the above frameworks?
>
> If you have you had any experience using spark with any of these
> technologies, please share.
>
> Thanks, Boris
>
>
>

Mime
View raw message