spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <>
Subject Re: How to use cluster for large set of linux files
Date Wed, 22 Jan 2014 20:39:49 GMT
Hi Manoj,

You’d have to make the files available at the same path on each machine through something
like NFS. You don’t need to copy them, though that would also work.


On Jan 22, 2014, at 12:37 PM, Manoj Samel <> wrote:

> I have a set of csv files that I want to read as a single RDD using a stand alone cluster.

> These file reside on one machine right now. If I start a cluster with multiple worker
nodes, how do I use these worker nodes to read the files and do the RDD computation ? Do I
have to copy the files on every worker node ?
> Assume that copying these into a HDFS is not a option for now ..
> Thanks,

View raw message