spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mingyu Kim <>
Subject Re: Which OutputCommitter to use for S3?
Date Fri, 20 Feb 2015 23:52:07 GMT
I didn’t get any response. It’d be really appreciated if anyone using a special OutputCommitter
for S3 can comment on this!


From: Mingyu Kim <<>>
Date: Monday, February 16, 2015 at 1:15 AM
To: "<>" <<>>
Subject: Which OutputCommitter to use for S3?

HI all,

The default OutputCommitter used by RDD, which is FileOutputCommitter, seems to require moving
files at the commit step, which is not a constant operation in S3, as discussed in<>.
People seem to develop their own NullOutputCommitter implementation or use DirectFileOutputCommitter
(as mentioned in SPARK-3595<>),
but I wanted to check if there is a de facto standard, publicly available OutputCommitter
to use for S3 in conjunction with Spark.


View raw message