spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang, Liye (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey
Date Mon, 08 Dec 2014 16:14:16 GMT

    [ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238009#comment-14238009
] 

Zhang, Liye edited comment on SPARK-4740 at 12/8/14 4:13 PM:
-------------------------------------------------------------

Hi [~rxin], [~adav], I uploaded the debug level log (*rxin_patch-on_4_node_cluster_48CoresPerNode(Unbalance).7z*)
for the four node cluster, in the .7z archive, there is a stage page html and a .gz file,
logs are in the .gz files, one driver log and four executor logs. The reduce time for this
test is 32 mins. 

One more thing, not all the test with rxin's patch will have serious unbalance problem, sometimes,
the finished tasks number are nearly the same. But there will be one node still finishes a
lot earlier than others. If then, the reduce will be more faster, say 30 mins, which is much
better than without the patch (40mins). However, NIO still outperforms (27 mins).

On the 4 node cluster (_48 cores per node, spark.executor.memory=36g, Hadoop1.2.1, 10G NIC_)
we can get some performance gain by applying the patch. We also tested on [~jerryshao]'s 6
nodes cluster (_24 cores per node, spark.executor.memory=36g, Hadoop2.4.1, 1G NIC_), the unbalance
problems are very serious, some node can only finish about 1/3 tasks of the best one. Which
leading to worse performance with reduce time even longer than that without the patch (*57
mins with patch* VS *50 mins without patch*)


was (Author: liyezhang556520):
Hi [~rxin], [~adav], I uploaded the debug level log (*rxin_patch-on_4_node_cluster_48CoresPerNode(Unbalance).7z*)
for the four node cluster, in the .7z archive, there is a stage page html and a .gz file,
logs are in the .gz files, one driver log and four executor logs. The reduce time for this
test is 32 mins. 

One more thing, not all the test with rxin's patch will have serious unbalance problem, sometimes,
the finished tasks number are nearly the same. But there will be one node still finishes a
lot earlier than others. If then, the reduce will be more faster, say 30 mins, which is much
better than without the patch (40mins). However, NIO still outperforms (27 mins).

On the 4 node cluster (_48 cores per node, spark.executor.memory=36g, Hadoop1.2.1, 10G NIC_)
we can get some performance gain by applying the patch. We also tested on [~jerryshao]'s 6
nodes cluster (_24 cores per node, spark.executor.memory=36g, Hadoop2.4.1, 1G NIC_), the unbalance
problems are very serious, some node can only finish about 1/3 tasks of the best one.

> Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey
> ------------------------------------------------------------------------
>
>                 Key: SPARK-4740
>                 URL: https://issues.apache.org/jira/browse/SPARK-4740
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Zhang, Liye
>            Assignee: Reynold Xin
>            Priority: Blocker
>         Attachments: (rxin patch better executor)TestRunner  sort-by-key - Thread dump
for executor 3_files.zip, (rxin patch normal executor)TestRunner  sort-by-key - Thread dump
for executor 0 _files.zip, Spark-perf Test Report 16 Cores per Executor.pdf, Spark-perf Test
Report.pdf, TestRunner  sort-by-key - Thread dump for executor 1_files (Netty-48 Cores per
node).zip, TestRunner  sort-by-key - Thread dump for executor 1_files (Nio-48 cores per node).zip,
rxin_patch-on_4_node_cluster_48CoresPerNode(Unbalance).7z
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf (sort-by-key, aggregate-by-key,
etc), Netty based shuffle transferService takes much longer time than NIO based shuffle transferService.
The network throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 billion records,
and the total size is about 400GB. Spark-perf test is Running on a 4 node cluster with 10G
NIC, 48 cpu cores per node and each executor memory is 64GB. The reduce tasks number is set
to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message