spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manoj Samel <>
Subject How to use cluster for large set of linux files
Date Wed, 22 Jan 2014 20:37:22 GMT
I have a set of csv files that I want to read as a single RDD using a stand
alone cluster.

These file reside on one machine right now. If I start a cluster with
multiple worker nodes, how do I use these worker nodes to read the files
and do the RDD computation ? Do I have to copy the files on every worker
node ?

Assume that copying these into a HDFS is not a option for now ..


View raw message