spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Chammas <>
Subject Spark 3.0 and S3A
Date Mon, 28 Oct 2019 15:34:28 GMT
Howdy folks,

I have a question about what is happening with the 3.0 release in relation
to Hadoop and hadoop-aws

Today, among other builds, we release a build of Spark built against Hadoop
2.7 and another one built without Hadoop. In Spark 3+, will we continue to
release Hadoop 2.7 builds as one of the primary downloads on the download
page <>? Or will we start building
Spark against a newer version of Hadoop?

The reason I ask is because successive versions of hadoop-aws have made
significant usability improvements to S3A. To get those, users need to
download the Hadoop-free build of Spark
<> and then link
Spark to a version of Hadoop newer than 2.7. There are various dependency
and runtime issues with trying to pair Spark built against Hadoop 2.7 with
hadoop-aws 2.8 or newer.

If we start releasing builds of Spark built against Hadoop 3.2 (or another
recent version), users can get the latest S3A improvements via --packages
"org.apache.hadoop:hadoop-aws:3.2.1" without needing to download Hadoop


View raw message