spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: 回复: Where to put "local" data files?
Date Wed, 01 Jan 2014 16:28:02 GMT
Yes it will.  This is called data locality and it works by matching the
hostname in Spark with the one in HDFS.


On Wed, Jan 1, 2014 at 2:40 AM, guxiaobo1982 <guxiaobo1982@qq.com> wrote:

> Hi Andrew,
>
>
> Thanks for your reply, I have another question about using HDFS, when running hdfs and
the standalone mode on the same cluster, will the spark workers only read data on the same
server to avoid transfering data over network.
>
> Xiaobo gu
>
> 在 2014年01月01日 05:37:36
> "andrew"<andrew@andrewash.com> 写道:
>
> Hi Xiaobo,
>
> I would recommend putting the files into an HDFS cluster on the same
> machines instead if possible. ?If you're concerned about duplicating the
> data, you can set the replication factor to 1 so you don't use more space
> than before.
>
> In my experience of Spark around 0.7.0 or so, when reading from a local
> file with sc.textFile("file:///...") you had to have that file in that
> exact path on every Spark worker machine.
>
> Cheers,
> Andrew
>
>
> On Tue, Dec 31, 2013 at 5:34 AM, guxiaobo1982 <guxiaobo1982@qq.com> wrote:
>
>> Hi,
>>
>> We are going to deploy a standalone mode cluster, we know Spark can read
>> local data files into RDDs, but the question is where should we put the
>> data file, on the server where commit our application, or the server where
>> the master service runs?
>>
>> Regards,
>>
>> Xiaobo Gu
>>
>
>

Mime
View raw message