spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mitesh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-20112) SIGSEGV in GeneratedIterator.sort_addToSorter
Date Tue, 28 Mar 2017 15:41:41 GMT

    [ https://issues.apache.org/jira/browse/SPARK-20112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945399#comment-15945399
] 

Mitesh edited comment on SPARK-20112 at 3/28/17 3:40 PM:
---------------------------------------------------------

[~kiszk] I can try out spark 2.0.3+ or 2.1. Actually I disabled wholestage codegen and I do
see a failure still on 2.0.2, but in a different place now in {{HashJoin.advanceNext}}. Also
uploaded the new hs_err_pid22870. The hashed relations are around 1-10M, but a few are 200M.


{noformat}
17/03/27 22:15:59 DEBUG [Executor task launch worker-17] TaskMemoryManager: Task 152119 acquired
64.0 KB for org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@2b60e395
SIGSEGV17/03/27 22:15:59 DEBUG [Executor task launch worker-17] TaskMemoryManager: Task 152119
acquired 64.0 MB for org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@2b60e395
[thread 140369911781120 also had an error]
 (0xb) at pc=0x00007fad1f7afc11, pid=22870, tid=140369909675776
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 compressed
oops)
# Problematic frame:
# J 25558 C2 org.apache.spark.sql.execution.joins.HashJoin$$anonfun$outerJoin$1$$anon$1.advanceNext()Z
(110 bytes) @ 0x00007fad1f7afc11 [0x00007fad1f7afb20+0xf1]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit
-c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /mnt/xvdb/spark/worker_dir/app-20170327213416-0005/14/hs_err_pid22870.log
17/03/27 22:15:59 DEBUG [Executor task launch worker-19] TaskMemoryManager: Task 152090 acquired
64.0 MB for org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@51de5289
 2502.591: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason:
occupancy higher than threshold, occupancy: 7667187712 bytes, allocation request: 1677770640
bytes, threshold: 8214124950 bytes (45.00 %), source: concurrent humongous allocation]
[thread 140376087648000 also had an error]
[thread 140369903376128 also had an error]
#
{noformat}




was (Author: masterddt):
[~kiszk] I can try out spark 2.0.3+ or 2.1. Actually I disabled wholestage codegen and I do
see a failure still on 2.0.2, but in a different place now in {{HashJoin.advanceNext}}. Also
uploaded the new hs_err_pid22870. The hashed relations are around 1-10M, but a few are 200M.

{noformat}
17/03/27 22:15:59 DEBUG [Executor task launch worker-17] TaskMemoryManager: Task 152119 acquired
64.0 KB for org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@2b60e395
SIGSEGV17/03/27 22:15:59 DEBUG [Executor task launch worker-17] TaskMemoryManager: Task 152119
acquired 64.0 MB for org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@2b60e395
[thread 140369911781120 also had an error]
 (0xb) at pc=0x00007fad1f7afc11, pid=22870, tid=140369909675776
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 compressed
oops)
# Problematic frame:
# J 25558 C2 org.apache.spark.sql.execution.joins.HashJoin$$anonfun$outerJoin$1$$anon$1.advanceNext()Z
(110 bytes) @ 0x00007fad1f7afc11 [0x00007fad1f7afb20+0xf1]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit
-c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /mnt/xvdb/spark/worker_dir/app-20170327213416-0005/14/hs_err_pid22870.log
17/03/27 22:15:59 DEBUG [Executor task launch worker-19] TaskMemoryManager: Task 152090 acquired
64.0 MB for org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@51de5289
 2502.591: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason:
occupancy higher than threshold, occupancy: 7667187712 bytes, allocation request: 1677770640
bytes, threshold: 8214124950 bytes (45.00 %), source: concurrent humongous allocation]
[thread 140376087648000 also had an error]
[thread 140369903376128 also had an error]
#
{noformat}



> SIGSEGV in GeneratedIterator.sort_addToSorter
> ---------------------------------------------
>
>                 Key: SPARK-20112
>                 URL: https://issues.apache.org/jira/browse/SPARK-20112
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.2
>         Environment: AWS m4.10xlarge with EBS (io1 drive, 400g, 4000iops)
>            Reporter: Mitesh
>         Attachments: codegen_sorter_crash.log, hs_err_pid19271.log, hs_err_pid22870.log
>
>
> I'm seeing a very weird crash in {{GeneratedIterator.sort_addToSorter}}. The hs_err_pid
and codegen file are attached (with query plans). Its not a deterministic repro, but running
a big query load, I eventually see it come up within a few minutes.
> Here is some interesting repro information:
> - Using AWS r3.8xlarge machines, which have ephermal attached drives, I can't repro this.
But it does repro with m4.10xlarge with an io1 EBS drive. So I think that means its not an
issue with the code-gen, but I cant figure out what the difference in behavior is.
> - The broadcast joins in the plan are all small tables. I have autoJoinBroadcast=-1 because
I always hint which tables should be broadcast.
> - As you can see from the plan, all the sources are cached memory tables. And we partition/sort
them all beforehand so its always sort-merge-joins or broadcast joins (with small tables).
> {noformat}
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  [thread 139872345896704 also had an error]
> SIGSEGV (0xb) at pc=0x00007f38a378caa3, pid=19271, tid=139872342738688
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 compressed
oops)
> [thread 139872348002048 also had an error]# Problematic frame:
> # 
> J 28454 C1 org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$GeneratedIterator;)V
(369 bytes) @ 0x00007f38a378caa3 [0x00007f38a378b5e0+0x14c3]
> {noformat}
> This kind of looks like https://issues.apache.org/jira/browse/SPARK-15822, but that is
marked fix in 2.0.0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message