spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edmon Begoli <ebeg...@gmail.com>
Subject Spark on HDFS vs. Lustre vs. other file systems - formal research and performance evaluation
Date Fri, 13 Mar 2015 22:06:38 GMT
All,

Does anyone have any reference to a publication or other, informal sources
(blogs, notes), showing
performance of Spark on HDFS vs. other shared (Lustre, etc.) or other file
system (NFS).

I need this for formal performance research.

We are currently doing a research into this on a very specific, butique
machine, and we are seeing some controversial results.

For the purpose of literature survey and general comparison I would like to
see the findings that others have had. I know that general wisdom states
that Spark and HDFS should work the best because of the data locality
awareness.

Thank you,
*Edmon Begoli, PhD*
Chief Data Officer
Joint Institute for Computational Sciences (JICS)
ebegoli@tennessee.edu
https://www.linkedin.com/in/ebegoli

Mime
View raw message