spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Romanchuk <alexey.romanc...@gmail.com>
Subject Log hdfs blocks sending
Date Thu, 25 Sep 2014 07:09:47 GMT
Hello again spark users and developers!

I have standalone spark cluster (1.1.0) and spark sql running on it. My
cluster consists of 4 datanodes and replication factor of files is 3.

I use thrift server to access spark sql and have 1 table with 30+
partitions. When I run query on whole table (something simple like select
count(*) from t) spark produces a lot of network activity filling all
available 1gb link. Looks like spark sent data by network instead of local
reading.

Is it any way to log which blocks were accessed locally and which are not?

Thanks!

Mime
View raw message