Yes, that's a key concern about the Java dependency, that its update is a function of the OS packages and those who control them, which is often not the end user. I think that's why this has been delayed a while. My general position is that, of course, someone in that boat can use Spark 2.1.x. It's likely going to see maintenance releases through the end of the year, even. On the flip side, no (non-paid) support has been available for Java 7 for a while. It wouldn't surprise me if some people are yet still stuck on Java 7; it would surprise me if they expect to use the latest of any package at this stage. Taking your CDH example, yes it's been a couple years since people have been able to deploy it on Java 8. Spark 2 isn't supported before 5.7 anyway. The default is Java 8.

Scala 2.10 is a good point that we are dealing with now. It's not really a question of whether it will run -- it's all libraries and bytecode to the JVM and it will happily deal with a mix of 7 and 8 bytecode. It's a question of whether the build for 2.10 will succeed. I believe it's 'yes' but am following up on some tests there.

On Tue, Feb 14, 2017 at 1:15 AM Charles Allen <charles.allen@metamarkets.com> wrote:
I think the biggest concern is enterprise users/operators who do not have the authority or access to upgrade hadoop/yarn clusters to java8. As a reference point, apparently CDH 5.3 shipped with java 8 in December 2014. I would be surprised if such users were active consumers of the dev mailing list, though. Unfortunately there's a bit of a selection bias in this list.

The other concern is if there is guaranteed compatibility between scala and java8 for all versions you want to use (which is somewhat touched upon in the PR). Are you thinking about supporting scala 2.10 against java 8 byte code?

See https://groups.google.com/d/msg/druid-user/aTGQlnF1KLk/NvBPfmigAAAJ for the similar discussion that went forward in the Druid community.


On Fri, Feb 10, 2017 at 8:47 AM Sean Owen <sowen@cloudera.com> wrote:
As you have seen, there's a WIP PR to implement removal of Java 7 support: https://github.com/apache/spark/pull/16871

I have heard several +1s at https://issues.apache.org/jira/browse/SPARK-19493 but am asking for concerns too, now that there's a concrete change to review.

If this goes in for 2.2 it can be followed by more extensive update of the Java code to take advantage of Java 8; this is more or less the baseline change.

We also just removed Hadoop 2.5 support. I know there was talk about removing Python 2.6. I have no opinion on that myself, but, might be time to revive that conversation too.