spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srikanth <>
Subject "Job duration" and "Processing time" don't match
Date Thu, 08 Sep 2016 19:31:44 GMT

I was looking at Spark streaming UI and noticed a big difference between
"Processing time" and "Job duration"

[image: Inline image 1]

Processing time/Output Op duration is show as 50s but sum of all job
duration is ~25s.
What is causing this difference? Based on logs I know that the batch
actually took 50s.

[image: Inline image 2]

The job that is taking most of time is
           .options(Map("mode" -> "DROPMALFORMED", "delimiter" -> "\t",
"header" -> "false"))
           .partitionBy("entityId", "regionId", "eventDate")

Removing SaveMode.Append really speeds things up and also the mismatch
between Job duration and processing time disappears.
I'm not able to explain what is causing this though.


View raw message