From issues-return-20179-apmail-spark-issues-archive=spark.apache.org@spark.apache.org Sun Dec 7 02:27:12 2014 Return-Path: X-Original-To: apmail-spark-issues-archive@minotaur.apache.org Delivered-To: apmail-spark-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9A30110AB9 for ; Sun, 7 Dec 2014 02:27:12 +0000 (UTC) Received: (qmail 3445 invoked by uid 500); 7 Dec 2014 02:27:12 -0000 Delivered-To: apmail-spark-issues-archive@spark.apache.org Received: (qmail 3411 invoked by uid 500); 7 Dec 2014 02:27:12 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 3401 invoked by uid 99); 7 Dec 2014 02:27:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Dec 2014 02:27:12 +0000 Date: Sun, 7 Dec 2014 02:27:12 +0000 (UTC) From: "Aaron Davidson (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237026#comment-14237026 ] Aaron Davidson commented on SPARK-4740: --------------------------------------- Could we get logs from the good/bad executors? I'm curious to which nodes each one has connected. (Ideally the logs would be at the DEBUG level, otherwise it probably would not have enough information.) We can also work on trying to repro on our own setup, though this looks like we're pretty close. > Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey > ------------------------------------------------------------------------ > > Key: SPARK-4740 > URL: https://issues.apache.org/jira/browse/SPARK-4740 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core > Affects Versions: 1.2.0 > Reporter: Zhang, Liye > Assignee: Reynold Xin > Priority: Blocker > Attachments: (rxin patch better executor)TestRunner sort-by-key - Thread dump for executor 3_files.zip, (rxin patch normal executor)TestRunner sort-by-key - Thread dump for executor 0 _files.zip, Spark-perf Test Report 16 Cores per Executor.pdf, Spark-perf Test Report.pdf, TestRunner sort-by-key - Thread dump for executor 1_files (Netty-48 Cores per node).zip, TestRunner sort-by-key - Thread dump for executor 1_files (Nio-48 cores per node).zip > > > When testing current spark master (1.3.0-snapshot) with spark-perf (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService takes much longer time than NIO based shuffle transferService. The network throughput of Netty is only about half of that of NIO. > We tested with standalone mode, and the data set we used for test is 20 billion records, and the total size is about 400GB. Spark-perf test is Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each executor memory is 64GB. The reduce tasks number is set to 1000. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org