spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shay Seng <s...@1618labs.com>
Subject Re: Some questions about task distribution and execution in Spark
Date Thu, 03 Oct 2013 18:05:24 GMT
Inlined.

On Wed, Oct 2, 2013 at 1:00 PM, Matei Zaharia <matei.zaharia@gmail.com>wrote:

> Hi Shangyu,
>
> (1) When we read in a local file by SparkContext.textFile and do some
> map/reduce job on it, how will spark decide to send data to which worker
> node? Will the data be divided/partitioned equally according to the number
> of worker node and each worker node get one piece of data?
>
> You actually can't run distributed jobs on local files. The local file URL
> only works on the same machine, or if the file is in a filesystem that's
> mounted on the same path on all worker nodes.
>
> Is this really true?

scala> val f = sc.textFile("/root/ue/ue_env.sh")
13/10/03 17:55:45 INFO storage.MemoryStore: ensureFreeSpace(34870) called
with curMem=34870, maxMem=4081511301
13/10/03 17:55:45 INFO storage.MemoryStore: Block broadcast_1 stored as
values to memory (estimated size 34.1 KB, free 3.8 GB)
f: spark.RDD[String] = MappedRDD[5] at textFile at <console>:12

scala> f.map(l=>l.split(" ")).collect
13/10/03 17:55:51 INFO mapred.FileInputFormat: Total input paths to process
: 1
13/10/03 17:55:51 INFO spark.SparkContext: Starting job: collect at
<console>:15
13/10/03 17:55:51 INFO scheduler.DAGScheduler: Got job 2 (collect at
<console>:15) with 2 output partitions (*allowLocal=false*)
 ...
13/10/03 17:55:51 INFO cluster.TaskSetManager: Starting task 2.0:0 as TID 4
on executor 0: ip-10-129-25-28 (preferred)
13/10/03 17:55:51 INFO cluster.TaskSetManager: Serialized task 2.0:0 as
1517 bytes in 3 ms
13/10/03 17:55:51 INFO cluster.TaskSetManager: Starting task 2.0:1 as TID 5
on executor 0: ip-10-129-25-28 (preferred)
13/10/03 17:55:51 INFO cluster.TaskSetManager: Serialized task 2.0:1 as
1517 bytes in 0 ms

Doesn't allowLocal=false mean the job is getting distributed to workers
rather than computed locally?


tks


>
> Matei
>
>
> Any help will be appreciated.
> Thanks!
>
>
>
>
> --
> --
>
> Shangyu, Luo
> Department of Computer Science
> Rice University
>
>
>

Mime
View raw message