spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: WebUI shows poor locality when task scheduling
Date Sun, 26 Apr 2015 21:27:46 GMT
Hi Eric - please direct this to the user@ list. This list is for
development of Spark itself.

On Sun, Apr 26, 2015 at 1:12 AM, eric wong <win19999@gmail.com> wrote:
>
>
>
> Hi developers,
>
> I have sent to user mail list but no response...
>
> When running a exprimental KMeans job for expriment, the Cached RDD is
> original Points data.
>
> I saw poor locality in Task details from WebUI. Almost one half of the input
> of task is Network instead of Memory.
>
> And Task with network input consumes almost the same time compare with the
> task with  Hadoop(Disk) input, and twice with task(Memory input).
> e.g
> Task(Memory): 16s
> Task(Network): 9s
> Task(Hadoop): 9s
>
>
> I see fectching RDD with 30MB form remote node consumes 5 seconds in
> executor logs like below:
>
> 15/03/31 04:08:52 INFO CoarseGrainedExecutorBackend: Got assigned task 58
> 15/03/31 04:08:52 INFO Executor: Running task 15.0 in stage 1.0 (TID 58)
> 15/03/31 04:08:52 INFO HadoopRDD: Input split:
> hdfs://master:8000/kmeans/data-Kmeans-5.3g:2013265920+134217728
> 15/03/31 04:08:52 INFO BlockManager: Found block rdd_3_15 locally
> 15/03/31 04:08:58 INFO Executor: Finished task 15.0 in stage 1.0 (TID 58).
> 1920 bytes result sent to driver
> 15/03/31 04:08:58 INFO CoarseGrainedExecutorBackend: Got assigned task 60
> -----------------Task60
> 15/03/31 04:08:58 INFO Executor: Running task 17.0 in stage 1.0 (TID 60)
> 15/03/31 04:08:58 INFO HadoopRDD: Input split:
> hdfs://master:8000/kmeans/data-Kmeans-5.3g:2281701376+134217728
> 15/03/31 04:09:02 INFO BlockManager: Found block rdd_3_17 remotely
> 15/03/31 04:09:12 INFO Executor: Finished task 17.0 in stage 1.0 (TID 60).
> 1920 bytes result sent to driver
>
>
> So
> 1)is that means i should use RDD with cache(MEMORY_AND_DISK) instead of
> Memory only?
>
> 2)And should i expand Network capacity or turn Schduling locality parameter?
>   i set spark.locality.wait up to 15000, but no effect seems to increase the
> Memory input percentage
>
> Any suggestion will be appreciated.
>
>
> ------Env info-----------
>
> Cluster: 4 worker, with 1 Cores and 2G executor memory
>
> Spark version: 1.1.0
>
> Network: 30MB/s
>
> -----Submit shell-------
> bin/spark-submit --class org.apache.spark.examples.mllib.JavaKMeans --master
> spark://master:7077 --executor-memory 1g
> lib/spark-examples-1.1.0-hadoop2.3.0.jar
> hdfs://master:8000/kmeans/data-Kmeans-7g 8 1
>
>
> Thanks very much and forgive for my poor English.
>
> --
> Wang Haihua
>
>
>
>
>
>
> --
> 王海华
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message