spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: HDFS as Shuffle Service
Date Wed, 27 Apr 2016 17:24:42 GMT

> On 27 Apr 2016, at 04:59, Takeshi Yamamuro <> wrote:
> Hi, all
> See SPARK-1529 for related discussion.
> // maropu

I'd not seen that discussion.

I'm actually curious about why the 15% diff in performance between Java NIO and Hadoop FS
APIs, and, if it is the case (Hadoop still uses the pre-NIO libraries, *has anyone thought
of just fixing Hadoop Local FS codepath?*

It's not like anyone hasn't filed JIRAs on that ... it's just that nothing has ever got to
a state where it was considered ready to adopt, where "ready" means: passes all unit and load
tests against Linux, Unix, Windows filesystems. There's been some attempts, but they never
quite got much engagement or support, especially as nio wasn't there properly until Java 7,
—and Hadoop was stuck on java 6 support until 2015. That's no longer a constraint: someone
could do the work, using the existing JIRAs as starting points.

If someone did do this in RawLocalFS, it'd be nice if the patch also allowed you to turn off
CRC creation and checking. 

That's not only part of the overhead, it means that flush() doesn't, not until you reach the
end of a CRC32 block ... so breaking what few durability guarantees POSIX offers.

View raw message