spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Li <liji...@gmail.com>
Subject Re: Loading Files from HDFS Incurs Network Communication
Date Mon, 26 Oct 2015 09:12:34 GMT
I cat /proc/net/dev and then take the difference of received bytes before
and after the job. I also see a long-time peak (nearly 600Mb/s) in nload
interface.  We have 18 machines and each machine receives 4.7G bytes.

On Mon, Oct 26, 2015 at 5:00 PM Sean Owen <sowen@cloudera.com> wrote:

> -dev +user
> How are you measuring network traffic?
> It's not in general true that there will be zero network traffic, since
> not all executors are local to all data. That can be the situation in many
> cases but not always.
>
> On Mon, Oct 26, 2015 at 8:57 AM, Jinfeng Li <lijinf8@gmail.com> wrote:
>
>> Hi, I find that loading files from HDFS can incur huge amount of network
>> traffic. Input size is 90G and network traffic is about 80G. By my
>> understanding, local files should be read and thus no network communication
>> is needed.
>>
>> I use Spark 1.5.1, and the following is my code:
>>
>> val textRDD = sc.textFile("hdfs://master:9000/inputDir")
>> textRDD.count
>>
>> Jeffrey
>>
>
>

Mime
View raw message