spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Starks <suse...@protonmail.com.INVALID>
Subject How to debug Spark job
Date Fri, 07 Sep 2018 09:47:58 GMT
I have a Spark job that reads from a postgresql (v9.5) table, and write result to parquet.
The code flow is not complicated, basically

    case class MyCaseClass(field1: String, field2: String)
    val df = spark.read.format("jdbc")...load()
    df.createOrReplaceTempView(...)
    val newdf = spark.sql("seslect field1, field2 from mytable").as[MyCaseClass].map { row
=>
      val fieldX = ... // extract something from field2
      (field1, fileldX)
    }.filter { ... /* filter out field 3 that's not valid */ }
    newdf.write.mode(...).parquet(destPath)

This job worked correct without a problem. But it's doesn't look working ok (the job looks
like hanged) when adding more fields. The refactored job looks as below
    ...
    val newdf = spark.sql("seslect field1, field2, ... fieldN from mytable").as[MyCaseClassWithMoreFields].map
{ row =>
        ...
        NewCaseClassWithMoreFields(...) // all fields plus fieldX
    }.filter { ... }
    newdf.write.mode(...).parquet(destPath)

Basically what the job does is extracting some info from one of a field in db table, appends
that newly extracted field to the original row, and then dumps the whole new table to parquet.

    new filed + (original field1 + ... + original fieldN)
    ...
    ...

Records loaded by spark sql to spark job (before refactored) are around 8MM, this remains
the same, but when the refactored spark runs, it looks hanging there without progress. The
only output on the console is (there is no crash, no exceptions thrown)

    WARN  HeartbeatReceiver:66 - Removing executor driver with no recent heartbeats: 137128
ms exceeds timeout 120000 ms

Memory in top command looks like

    VIRT     RES     SHR        %CPU %MEM
    15.866g 8.001g  41.4m     740.3   25.6

The command used to  submit spark job is

    spark-submit --class ... --master local[*] --driver-memory 10g --executor-memory 10g ...
--files ... --driver-class-path ... <jar file> ...

How can I debug or check which part of my code might cause the problem (so I can improve it)?

Thanks
Mime
View raw message