spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: Writing files to s3 with out temporary directory
Date Fri, 01 Dec 2017 12:33:12 GMT


Hadoop trunk (i.e 3.1 when it comes out), has the code to do 0-rename commits


http://steveloughran.blogspot.co.uk/2017/11/subatomic.html

if you want to play today, you can build Hadoop trunk & spark master,  + a little glue
JAR of mine to get Parquet to play properly

http://steveloughran.blogspot.co.uk/2017/11/how-to-play-with-new-s3a-committers.html



On 21 Nov 2017, at 15:03, Jim Carroll <jimfcarroll@gmail.com<mailto:jimfcarroll@gmail.com>>
wrote:

It's not actually that tough. We already use a custom Hadoop FileSystem for
S3 because when we started using Spark with S3 the native FileSystem was
very unreliable. Our's is based on the code from Presto. (see
https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/s3/PrestoS3FileSystem.java
).

I already have a version that introduces a hash to the filename for the file
that's actually written to the S3 to see if it makes a difference per
https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html#get-workload-considerations
. FWIW, it doesn't.

AFAIK, the more the hash appears up the the directory tree, the better it is. The classic
partitioned layout here is exactly what y don't want.


I'm going to modify that experiment to override the key
name like before except actually mode the file, keep track of the state, and
override the rename method.


you might find this intersting too  https://arxiv.org/abs/1709.01812 .

IBM's stocator FS remaps from dest/_temporary/$jobAttemp/$taskAttempt/part-0000 to  a file
dest/part-$jobAttempt-$taskAttempt-000

This makes it possible to cleanup failed tasks & jobs; without that on any task failure
the entire job needs to be failed.



The problems with this approach are: 1) it's brittle because it depends on
the internal directory and file naming conventions in Hadoop and Parquet.


They do, but the actual workers have the right to generate files with different names than
part-0000.$suffix , stick in summary files, etc. Even: not create files, which is what ORC
does when there are no results for that part


2)
It will assume (as seems to be currently the case) that the 'rename' call is
done for all files from the driver.


The first step to the new committers was look at all the code where the old ones were called,
including stepping through with a debugger to work out exactly what the two intermingled commit
algorithms were up to

https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md

But it should do until there's a better
solution in the Hadoop committer.



If you are at the stage where you have your own FS implementation, you are probably ready
to pick up & play with the new s3a committers.


Mime
View raw message