spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From joa...@verona.se
Subject Question about Spark and filesystems
Date Sun, 18 Dec 2016 19:50:17 GMT
Hello,

We are trying out Spark for some file processing tasks.

Since each Spark worker node needs to access the same files, we have
tried using Hdfs. This worked, but there were some oddities making me a
bit uneasy. For dependency hell reasons I compiled a modified Spark, and
this version exhibited the odd behaviour with Hdfs. The problem might
have nothing to do with Hdfs, but the situation made me curious about
the alternatives.

Now I'm wondering what kind of file system would be suitable for our
deployment.

- There won't be a great number of nodes. Maybe 10 or so.

- The datasets won't be big by big-data standards(Maybe a couple of
  hundred gb)

So maybe I could just use a NFS server, with a caching client?
Or should I try Ceph, or Glusterfs?

Does anyone have any experiences to share?

-- 
Joakim Verona
joakim@verona.se

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message