spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edmon Begoli <>
Subject Spark on HDFS vs. Lustre vs. other file systems - formal research and performance evaluation
Date Fri, 13 Mar 2015 22:06:38 GMT

Does anyone have any reference to a publication or other, informal sources
(blogs, notes), showing
performance of Spark on HDFS vs. other shared (Lustre, etc.) or other file
system (NFS).

I need this for formal performance research.

We are currently doing a research into this on a very specific, butique
machine, and we are seeing some controversial results.

For the purpose of literature survey and general comparison I would like to
see the findings that others have had. I know that general wisdom states
that Spark and HDFS should work the best because of the data locality

Thank you,
*Edmon Begoli, PhD*
Chief Data Officer
Joint Institute for Computational Sciences (JICS)

View raw message