spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manoj Samel <manojsamelt...@gmail.com>
Subject Re: How to use cluster for large set of linux files
Date Wed, 22 Jan 2014 20:56:38 GMT
Thanks Matei.

One thing I noticed after doing this and starting MASTER=spark://xxxx
spark-shell is everything works , BUT the xxx.foreach(println) prints blank
line. All other logic seems working. If I do a xx.count etc, I can see the
value, just the println does not seems working


On Wed, Jan 22, 2014 at 12:39 PM, Matei Zaharia <matei.zaharia@gmail.com>wrote:

> Hi Manoj,
>
> You’d have to make the files available at the same path on each machine
> through something like NFS. You don’t need to copy them, though that would
> also work.
>
> Matei
>
> On Jan 22, 2014, at 12:37 PM, Manoj Samel <manojsameltech@gmail.com>
> wrote:
>
> > I have a set of csv files that I want to read as a single RDD using a
> stand alone cluster.
> >
> > These file reside on one machine right now. If I start a cluster with
> multiple worker nodes, how do I use these worker nodes to read the files
> and do the RDD computation ? Do I have to copy the files on every worker
> node ?
> >
> > Assume that copying these into a HDFS is not a option for now ..
> >
> > Thanks,
>
>

Mime
View raw message