spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Decaux" <ebui...@gmail.com>
Subject Understand task timing
Date Mon, 19 Feb 2018 10:13:40 GMT
Using Spark 1.6.2, I want to understand what « Duration » really mean (and why is slow).

Running a simple SELECT COUNT against a parquet file, stored within HDFS:

NODE_LOCAL 1 / DATA02 2018/02/19 09:54:27 5 s 30 ms 8.8 MB (hadoop) / 3010830 8 ms 77.2 KB
/ 1666

This means "took 5 secondes to read 8 M from HDFS » ? 

Thomas Decaux
Mime
View raw message