spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: TCP/IP speedup
Date Mon, 03 Aug 2015 00:07:12 GMT

On 1 Aug 2015, at 18:26, Ruslan Dautkhanov <<>>

If your network is bandwidth-bound, you'll see setting jumbo frames (MTU 9000)
may increase bandwidth up to ~20%.
"Enabling Jumbo Frames across the cluster improves bandwidth"


you can also get better checksums of packets, so that the (very small but non-zero) risk of
corrupted network packets drops a bit more.

If Spark workload is not network bandwidth-bound, I can see it'll be a few percent to no improvement.

Put differently: it shouldn't hurt. The shuffle phase is the most network heavy, especially
as it can span the entire cluster that backbone bandwidth "bisection bandwidth" can become
the bottleneck, and mean that jobs can interfere

scheduling of work close to the HDFS data means that HDFS reads should often be local (the
TCP stack gets bypassed entirely), or at least rack-local (sharing the switch, not any backbone)

but there's other things there, as the slide talks about

-stragglers: often a sign of pending HDD failure, as reads are retries. the classic hadoop
MR engine detects these, can spin up alternate mappers (if you enable speculation), and will
blacklist the node for further work. Sometimes though that straggling is just unbalanced data
-some bits of work may be computationally a lot harder, slowing things down.

-contention for work on the nodes. In YARN you request how many "virtual cores" you want (ops
get to define the map of virtual to physical), with each node having a finite set of cores

but ...
  -Unless CPU throttling is turned on, competing processes can take up more CPU than they
asked for.
  -that virtual:physical core setting may be of

There's also disk IOP contention; two jobs trying to get at the same spindle, even though
there are lots of disks on the server. There's not much you can do about that (today).

A key takeaway from that talk, which applies to all work-tuning talks is: get data from your
real workloads, There's some good htrace instrumentation in HDFS these days, I haven't looked
@ spark's instrumentation to see how they hook up. You can also expect to have some network
monitoring (sflow, ...) which you could use to see if the backbone is overloaded. Don't forget
the Linux tooling either, iotop &c. There's lots of room to play here -once you've got
the data you can see where to focus, then decide how much time to spend trying to tune it.


Ruslan Dautkhanov

On Sat, Aug 1, 2015 at 6:08 PM, Simon Edelhaus <<>>

2% huh.

-- ttfn
Simon Edelhaus
California 2015

On Sat, Aug 1, 2015 at 3:45 PM, Mark Hamstra <<>>

On Sat, Aug 1, 2015 at 3:24 PM, Simon Edelhaus <<>>
Hi All!

How important would be a significant performance improvement to TCP/IP itself, in terms of
overall job performance improvement. Which part would be most significantly accelerated?
Would it be HDFS?

-- ttfn
Simon Edelhaus
California 2015

View raw message