spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "guxiaobo1982" <guxiaobo1...@qq.com>
Subject 回复: Where to put "local" data files?
Date Wed, 01 Jan 2014 07:40:32 GMT
Hi Andrew,

Thanks for your reply, I have another question about using HDFS, when running hdfs and the
standalone mode on the same cluster, will the spark workers only read data on the same server
to avoid transfering data over network.

Xiaobo gu

在 2014年01月01日 05:37:36
"andrew"<andrew@andrewash.com> 写道:

Hi Xiaobo,

I would recommend putting the files into an HDFS cluster on the same machines instead if possible.
?If you're concerned about duplicating the data, you can set the replication factor to 1 so
you don't use more space than before.
 

In my experience of Spark around 0.7.0 or so, when reading from a local file with sc.textFile("file:///...")
you had to have that file in that exact path on every Spark worker machine.
  

Cheers,
Andrew



On Tue, Dec 31, 2013 at 5:34 AM, guxiaobo1982 <guxiaobo1982@qq.com> wrote:
 Hi,


We are going to deploy a standalone mode cluster, we know Spark can read local data files
into RDDs, but the question is where should we put the data file, on the server where commit
our application, or the server where the master service runs?
 

Regards,


Xiaobo Gu
Mime
View raw message