spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Femi Anthony <femib...@gmail.com>
Subject AWS EMR slow write to HDFS
Date Tue, 11 Jun 2019 12:50:26 GMT

I'm writing a large dataset in Parquet format to HDFS using Spark and it runs rather slowly
in EMR vs say Databricks. I realize that if I was able to use Hadoop 3.1, it would be much
more performant because it has a high performance output committer. Is this the case, and
if so - when will there be a version of EMR that uses Hadoop 3.1 ? The current version I'm
using is 5.21.
Sent from my iPhone
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message