spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "igor.berman" <igor.ber...@gmail.com>
Subject upload to s3, UI Total Duration and Sum of Job Durations
Date Wed, 01 Jul 2015 07:14:52 GMT
Hi,
Our job is reading files from s3, transforming/aggregating them and writing
them back to s3.

While investigating performance problems I've noticed that there is big
difference between sum of job durations and Total duration which appears in
UI
After investigating it a bit the difference caused by spark not counting
time it takes to upload file parts into s3 within job duration metric. IMHO
job is not finished yet(since it hasn't finished uploading parts), while in
spark I can see in Succeeded/Total 256/256(i.e. everything is done)

is there any possibility to see how much it takes to upload files? Are there
any plans to show "network" time? Why job marked as finished while upload is
still in progress?

we are using s3a, hadoop 2.7, spark 1.3.1

thanks in advance, 
Igor



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/upload-to-s3-UI-Total-Duration-and-Sum-of-Job-Durations-tp23563.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message