spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denny Lee <denny.g....@gmail.com>
Subject Re: [VOTE] Spark 2.3.1 (RC4)
Date Sun, 03 Jun 2018 03:09:20 GMT
+1

On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas <nicholas.chammas@gmail.com>
wrote:

> I'll give that a try, but I'll still have to figure out what to do if none
> of the release builds work with hadoop-aws, since Flintrock deploys Spark
> release builds to set up a cluster. Building Spark is slow, so we only do
> it if the user specifically requests a Spark version by git hash. (This is
> basically how spark-ec2 did things, too.)
>
>
> On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin <vanzin@cloudera.com> wrote:
>
>> If you're building your own Spark, definitely try the hadoop-cloud
>> profile. Then you don't even need to pull anything at runtime,
>> everything is already packaged with Spark.
>>
>> On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
>> <nicholas.chammas@gmail.com> wrote:
>> > pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
>> > either (even building with -Phadoop-2.7). I guess I’ve been relying on
>> an
>> > unsupported pattern and will need to figure something else out going
>> forward
>> > in order to use s3a://.
>> >
>> >
>> > On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <vanzin@cloudera.com>
>> wrote:
>> >>
>> >> I have personally never tried to include hadoop-aws that way. But at
>> >> the very least, I'd try to use the same version of Hadoop as the Spark
>> >> build (2.7.3 IIRC). I don't really expect a different version to work,
>> >> and if it did in the past it definitely was not by design.
>> >>
>> >> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>> >> <nicholas.chammas@gmail.com> wrote:
>> >> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
>> >> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0
>> release,
>> >> > so
>> >> > it appears something has changed since then.
>> >> >
>> >> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
>> >> >
>> >> > My goal here is simply to confirm that this release of Spark works
>> with
>> >> > hadoop-aws like past releases did, particularly for Flintrock users
>> who
>> >> > use
>> >> > Spark with S3A.
>> >> >
>> >> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop
>> builds
>> >> > with
>> >> > every Spark release. If the -hadoop2.7 release build won’t work with
>> >> > hadoop-aws anymore, are there plans to provide a new build type that
>> >> > will?
>> >> >
>> >> > Apologies if the question is poorly formed. I’m batting a bit
>> outside my
>> >> > league here. Again, my goal is simply to confirm that I/my users
>> still
>> >> > have
>> >> > a way to use s3a://. In the past, that way was simply to call pyspark
>> >> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very
>> similar.
>> >> > If
>> >> > that will no longer work, I’m trying to confirm that the change of
>> >> > behavior
>> >> > is intentional or acceptable (as a review for the Spark project) and
>> >> > figure
>> >> > out what I need to change (as due diligence for Flintrock’s users).
>> >> >
>> >> > Nick
>> >> >
>> >> >
>> >> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <vanzin@cloudera.com>
>> >> > wrote:
>> >> >>
>> >> >> Using the hadoop-aws package is probably going to be a little more
>> >> >> complicated than that. The best bet is to use a custom build of
>> Spark
>> >> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> >> >> looking at some nasty dependency issues, especially if you end
up
>> >> >> mixing different versions of Hadoop.
>> >> >>
>> >> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>> >> >> <nicholas.chammas@gmail.com> wrote:
>> >> >> > I was able to successfully launch a Spark cluster on EC2 at
2.3.1
>> RC4
>> >> >> > using
>> >> >> > Flintrock. However, trying to load the hadoop-aws package
gave me
>> >> >> > some
>> >> >> > errors.
>> >> >> >
>> >> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >> >> >
>> >> >> > <snipped>
>> >> >> >
>> >> >> > :: problems summary ::
>> >> >> > :::: WARNINGS
>> >> >> >                 [NOT FOUND  ]
>> >> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >> >> >         ==== local-m2-cache: tried
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> >> >> >                 [NOT FOUND  ]
>> >> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle)
(0ms)
>> >> >> >         ==== local-m2-cache: tried
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> >> >> >                 [NOT FOUND  ]
>> >> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >> >> >         ==== local-m2-cache: tried
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> >> >> >                 [NOT FOUND  ]
>> >> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >> >> >         ==== local-m2-cache: tried
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>> >> >> >
>> >> >> > I’d guess I’m probably using the wrong version of hadoop-aws,
but
>> I
>> >> >> > called
>> >> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what
else
>> to
>> >> >> > try.
>> >> >> >
>> >> >> > Any quick pointers?
>> >> >> >
>> >> >> > Nick
>> >> >> >
>> >> >> >
>> >> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <
>> vanzin@cloudera.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Starting with my own +1 (binding).
>> >> >> >>
>> >> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <
>> vanzin@cloudera.com>
>> >> >> >> wrote:
>> >> >> >> > Please vote on releasing the following candidate
as Apache
>> Spark
>> >> >> >> > version
>> >> >> >> > 2.3.1.
>> >> >> >> >
>> >> >> >> > Given that I expect at least a few people to be busy
with Spark
>> >> >> >> > Summit
>> >> >> >> > next
>> >> >> >> > week, I'm taking the liberty of setting an extended
voting
>> period.
>> >> >> >> > The
>> >> >> >> > vote
>> >> >> >> > will be open until Friday, June 8th, at 19:00 UTC
(that's 12:00
>> >> >> >> > PDT).
>> >> >> >> >
>> >> >> >> > It passes with a majority of +1 votes, which must
include at
>> least
>> >> >> >> > 3
>> >> >> >> > +1
>> >> >> >> > votes
>> >> >> >> > from the PMC.
>> >> >> >> >
>> >> >> >> > [ ] +1 Release this package as Apache Spark 2.3.1
>> >> >> >> > [ ] -1 Do not release this package because ...
>> >> >> >> >
>> >> >> >> > To learn more about Apache Spark, please see
>> >> >> >> > http://spark.apache.org/
>> >> >> >> >
>> >> >> >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
>> >> >> >> > https://github.com/apache/spark/tree/v2.3.1-rc4
>> >> >> >> >
>> >> >> >> > The release files, including signatures, digests,
etc. can be
>> >> >> >> > found
>> >> >> >> > at:
>> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>> >> >> >> >
>> >> >> >> > Signatures used for Spark RCs can be found in this
file:
>> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> >> >> >
>> >> >> >> > The staging repository for this release can be found
at:
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> https://repository.apache.org/content/repositories/orgapachespark-1272/
>> >> >> >> >
>> >> >> >> > The documentation corresponding to this release can
be found
>> at:
>> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>> >> >> >> >
>> >> >> >> > The list of bug fixes going into 2.3.1 can be found
at the
>> >> >> >> > following
>> >> >> >> > URL:
>> >> >> >> >
>> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>> >> >> >> >
>> >> >> >> > FAQ
>> >> >> >> >
>> >> >> >> > =========================
>> >> >> >> > How can I help test this release?
>> >> >> >> > =========================
>> >> >> >> >
>> >> >> >> > If you are a Spark user, you can help us test this
release by
>> >> >> >> > taking
>> >> >> >> > an existing Spark workload and running on this release
>> candidate,
>> >> >> >> > then
>> >> >> >> > reporting any regressions.
>> >> >> >> >
>> >> >> >> > If you're working in PySpark you can set up a virtual
env and
>> >> >> >> > install
>> >> >> >> > the current RC and see if anything important breaks,
in the
>> >> >> >> > Java/Scala
>> >> >> >> > you can add the staging repository to your projects
resolvers
>> and
>> >> >> >> > test
>> >> >> >> > with the RC (make sure to clean up the artifact cache
>> before/after
>> >> >> >> > so
>> >> >> >> > you don't end up building with a out of date RC going
forward).
>> >> >> >> >
>> >> >> >> > ===========================================
>> >> >> >> > What should happen to JIRA tickets still targeting
2.3.1?
>> >> >> >> > ===========================================
>> >> >> >> >
>> >> >> >> > The current list of open tickets targeted at 2.3.1
can be found
>> >> >> >> > at:
>> >> >> >> > https://s.apache.org/Q3Uo
>> >> >> >> >
>> >> >> >> > Committers should look at those and triage. Extremely
important
>> >> >> >> > bug
>> >> >> >> > fixes, documentation, and API tweaks that impact
compatibility
>> >> >> >> > should
>> >> >> >> > be worked on immediately. Everything else please
retarget to an
>> >> >> >> > appropriate release.
>> >> >> >> >
>> >> >> >> > ==================
>> >> >> >> > But my bug isn't fixed?
>> >> >> >> > ==================
>> >> >> >> >
>> >> >> >> > In order to make timely releases, we will typically
not hold
>> the
>> >> >> >> > release unless the bug in question is a regression
from the
>> >> >> >> > previous
>> >> >> >> > release. That being said, if there is something which
is a
>> >> >> >> > regression
>> >> >> >> > that has not been correctly targeted please ping
me or a
>> committer
>> >> >> >> > to
>> >> >> >> > help target the issue.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > Marcelo
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Marcelo
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> ---------------------------------------------------------------------
>> >> >> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Marcelo
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>>
>>
>>
>> --
>> Marcelo
>>
>

Mime
View raw message