From issues-return-232711-apmail-spark-issues-archive=spark.apache.org@spark.apache.org Thu Jul 25 18:05:50 2019 Return-Path: X-Original-To: apmail-spark-issues-archive@minotaur.apache.org Delivered-To: apmail-spark-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by minotaur.apache.org (Postfix) with SMTP id 75C1E18BCA for ; Thu, 25 Jul 2019 18:05:50 +0000 (UTC) Received: (qmail 66088 invoked by uid 500); 25 Jul 2019 18:00:06 -0000 Delivered-To: apmail-spark-issues-archive@spark.apache.org Received: (qmail 66066 invoked by uid 500); 25 Jul 2019 18:00:06 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 65868 invoked by uid 99); 25 Jul 2019 18:00:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jul 2019 18:00:01 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 8467DE0E1C for ; Thu, 25 Jul 2019 18:00:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 3CF7F265F0 for ; Thu, 25 Jul 2019 18:00:00 +0000 (UTC) Date: Thu, 25 Jul 2019 18:00:00 +0000 (UTC) From: =?utf-8?Q?Pawe=C5=82_Wiejacha_=28JIRA=29?= To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-27734) Add memory based thresholds for shuffle spill MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-27734?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D168= 93018#comment-16893018 ]=20 Pawe=C5=82 Wiejacha commented on SPARK-27734: ---------------------------------------- We also encountered this problem. We've reduced the problem to shuffling 60= GiB of data divided into 5 partitions using *repartitionAndSortWithinParti= tions*() and processing (*foreachPartition*()) all of them using a single e= xecutor that has 2 GiB of memory assigned. Processing each partition takes = ~70 minutes (52 min GC time) and CPU usage is very high (due to GC). Setting *spark.shuffle.spill.numElementsForceSpillThreshold* is very inconv= enient, so it would be nice to accept Adrian's pull request. > Add memory based thresholds for shuffle spill > --------------------------------------------- > > Key: SPARK-27734 > URL: https://issues.apache.org/jira/browse/SPARK-27734 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL > Affects Versions: 3.0.0 > Reporter: Adrian Muraru > Priority: Minor > > When running large shuffles (700TB input data, 200k map tasks, 50k reduce= rs on a 300 nodes cluster) the job is regularly OOMing in map and reduce ph= ase. > IIUC ShuffleExternalSorter (map side) and ExternalAppendOnlyMap and Exter= nalSorter (reduce side) are trying to max out the available execution memor= y. This in turn doesn't play nice with the=C2=A0Garbage Collector and execu= tors are failing with OutOfMemoryError when the memory allocation from thes= e in-memory structure is maxing out the available heap size (in our case we= are running with 9 cores/executor, 32G per executor) > To mitigate this, I set=C2=A0{{spark.shuffle.spill.numElementsForceSpillT= hreshold}} to force the spill on disk. While this=C2=A0config works, it is = not flexible enough as it's expressed in number of elements, and in our cas= e we run multiple shuffles in a single job and element size is different fr= om one stage to another. > We have an internal patch to extend this behaviour and add two new parame= ters to control the spill based on memory usage: > -=C2=A0spark.shuffle.spill.map.maxRecordsSizeForSpillThreshold > -=C2=A0spark.shuffle.spill.reduce.maxRecordsSizeForSpillThreshold > =C2=A0 -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org