spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dongjoon Hyun <dongjoon.h...@gmail.com>
Subject Re: [VOTE] Spark 2.3.1 (RC4)
Date Sun, 03 Jun 2018 07:23:30 GMT
+1

Bests,
Dongjoon.

On Sat, Jun 2, 2018 at 8:09 PM, Denny Lee <denny.g.lee@gmail.com> wrote:

> +1
>
> On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> I'll give that a try, but I'll still have to figure out what to do if
>> none of the release builds work with hadoop-aws, since Flintrock deploys
>> Spark release builds to set up a cluster. Building Spark is slow, so we
>> only do it if the user specifically requests a Spark version by git hash.
>> (This is basically how spark-ec2 did things, too.)
>>
>>
>> On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin <vanzin@cloudera.com>
>> wrote:
>>
>>> If you're building your own Spark, definitely try the hadoop-cloud
>>> profile. Then you don't even need to pull anything at runtime,
>>> everything is already packaged with Spark.
>>>
>>> On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
>>> <nicholas.chammas@gmail.com> wrote:
>>> > pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for
>>> me
>>> > either (even building with -Phadoop-2.7). I guess I’ve been relying on
>>> an
>>> > unsupported pattern and will need to figure something else out going
>>> forward
>>> > in order to use s3a://.
>>> >
>>> >
>>> > On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <vanzin@cloudera.com>
>>> wrote:
>>> >>
>>> >> I have personally never tried to include hadoop-aws that way. But at
>>> >> the very least, I'd try to use the same version of Hadoop as the Spark
>>> >> build (2.7.3 IIRC). I don't really expect a different version to work,
>>> >> and if it did in the past it definitely was not by design.
>>> >>
>>> >> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>>> >> <nicholas.chammas@gmail.com> wrote:
>>> >> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
>>> >> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0
>>> release,
>>> >> > so
>>> >> > it appears something has changed since then.
>>> >> >
>>> >> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
>>> >> >
>>> >> > My goal here is simply to confirm that this release of Spark works
>>> with
>>> >> > hadoop-aws like past releases did, particularly for Flintrock users
>>> who
>>> >> > use
>>> >> > Spark with S3A.
>>> >> >
>>> >> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop
>>> builds
>>> >> > with
>>> >> > every Spark release. If the -hadoop2.7 release build won’t work
with
>>> >> > hadoop-aws anymore, are there plans to provide a new build type
that
>>> >> > will?
>>> >> >
>>> >> > Apologies if the question is poorly formed. I’m batting a bit
>>> outside my
>>> >> > league here. Again, my goal is simply to confirm that I/my users
>>> still
>>> >> > have
>>> >> > a way to use s3a://. In the past, that way was simply to call
>>> pyspark
>>> >> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very
>>> similar.
>>> >> > If
>>> >> > that will no longer work, I’m trying to confirm that the change
of
>>> >> > behavior
>>> >> > is intentional or acceptable (as a review for the Spark project)
and
>>> >> > figure
>>> >> > out what I need to change (as due diligence for Flintrock’s users).
>>> >> >
>>> >> > Nick
>>> >> >
>>> >> >
>>> >> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <vanzin@cloudera.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Using the hadoop-aws package is probably going to be a little
more
>>> >> >> complicated than that. The best bet is to use a custom build
of
>>> Spark
>>> >> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
>>> >> >> looking at some nasty dependency issues, especially if you
end up
>>> >> >> mixing different versions of Hadoop.
>>> >> >>
>>> >> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>>> >> >> <nicholas.chammas@gmail.com> wrote:
>>> >> >> > I was able to successfully launch a Spark cluster on EC2
at
>>> 2.3.1 RC4
>>> >> >> > using
>>> >> >> > Flintrock. However, trying to load the hadoop-aws package
gave me
>>> >> >> > some
>>> >> >> > errors.
>>> >> >> >
>>> >> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>>> >> >> >
>>> >> >> > <snipped>
>>> >> >> >
>>> >> >> > :: problems summary ::
>>> >> >> > :::: WARNINGS
>>> >> >> >                 [NOT FOUND  ]
>>> >> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle)
(2ms)
>>> >> >> >         ==== local-m2-cache: tried
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/
>>> jersey-json/1.9/jersey-json-1.9.jar
>>> >> >> >                 [NOT FOUND  ]
>>> >> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle)
(0ms)
>>> >> >> >         ==== local-m2-cache: tried
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/
>>> jersey-server/1.9/jersey-server-1.9.jar
>>> >> >> >                 [NOT FOUND  ]
>>> >> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle)
(1ms)
>>> >> >> >         ==== local-m2-cache: tried
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > file:/home/ec2-user/.m2/repository/org/codehaus/
>>> jettison/jettison/1.1/jettison-1.1.jar
>>> >> >> >                 [NOT FOUND  ]
>>> >> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>>> >> >> >         ==== local-m2-cache: tried
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/
>>> jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>>> >> >> >
>>> >> >> > I’d guess I’m probably using the wrong version of
hadoop-aws,
>>> but I
>>> >> >> > called
>>> >> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure
what else
>>> to
>>> >> >> > try.
>>> >> >> >
>>> >> >> > Any quick pointers?
>>> >> >> >
>>> >> >> > Nick
>>> >> >> >
>>> >> >> >
>>> >> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <
>>> vanzin@cloudera.com>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> Starting with my own +1 (binding).
>>> >> >> >>
>>> >> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <
>>> vanzin@cloudera.com>
>>> >> >> >> wrote:
>>> >> >> >> > Please vote on releasing the following candidate
as Apache
>>> Spark
>>> >> >> >> > version
>>> >> >> >> > 2.3.1.
>>> >> >> >> >
>>> >> >> >> > Given that I expect at least a few people to
be busy with
>>> Spark
>>> >> >> >> > Summit
>>> >> >> >> > next
>>> >> >> >> > week, I'm taking the liberty of setting an extended
voting
>>> period.
>>> >> >> >> > The
>>> >> >> >> > vote
>>> >> >> >> > will be open until Friday, June 8th, at 19:00
UTC (that's
>>> 12:00
>>> >> >> >> > PDT).
>>> >> >> >> >
>>> >> >> >> > It passes with a majority of +1 votes, which
must include at
>>> least
>>> >> >> >> > 3
>>> >> >> >> > +1
>>> >> >> >> > votes
>>> >> >> >> > from the PMC.
>>> >> >> >> >
>>> >> >> >> > [ ] +1 Release this package as Apache Spark 2.3.1
>>> >> >> >> > [ ] -1 Do not release this package because ...
>>> >> >> >> >
>>> >> >> >> > To learn more about Apache Spark, please see
>>> >> >> >> > http://spark.apache.org/
>>> >> >> >> >
>>> >> >> >> > The tag to be voted on is v2.3.1-rc4 (commit
30aaa5a3):
>>> >> >> >> > https://github.com/apache/spark/tree/v2.3.1-rc4
>>> >> >> >> >
>>> >> >> >> > The release files, including signatures, digests,
etc. can be
>>> >> >> >> > found
>>> >> >> >> > at:
>>> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>>> >> >> >> >
>>> >> >> >> > Signatures used for Spark RCs can be found in
this file:
>>> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >> >> >> >
>>> >> >> >> > The staging repository for this release can be
found at:
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > https://repository.apache.org/content/repositories/
>>> orgapachespark-1272/
>>> >> >> >> >
>>> >> >> >> > The documentation corresponding to this release
can be found
>>> at:
>>> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>>> >> >> >> >
>>> >> >> >> > The list of bug fixes going into 2.3.1 can be
found at the
>>> >> >> >> > following
>>> >> >> >> > URL:
>>> >> >> >> > https://issues.apache.org/jira/projects/SPARK/versions/
>>> 12342432
>>> >> >> >> >
>>> >> >> >> > FAQ
>>> >> >> >> >
>>> >> >> >> > =========================
>>> >> >> >> > How can I help test this release?
>>> >> >> >> > =========================
>>> >> >> >> >
>>> >> >> >> > If you are a Spark user, you can help us test
this release by
>>> >> >> >> > taking
>>> >> >> >> > an existing Spark workload and running on this
release
>>> candidate,
>>> >> >> >> > then
>>> >> >> >> > reporting any regressions.
>>> >> >> >> >
>>> >> >> >> > If you're working in PySpark you can set up a
virtual env and
>>> >> >> >> > install
>>> >> >> >> > the current RC and see if anything important
breaks, in the
>>> >> >> >> > Java/Scala
>>> >> >> >> > you can add the staging repository to your projects
resolvers
>>> and
>>> >> >> >> > test
>>> >> >> >> > with the RC (make sure to clean up the artifact
cache
>>> before/after
>>> >> >> >> > so
>>> >> >> >> > you don't end up building with a out of date
RC going
>>> forward).
>>> >> >> >> >
>>> >> >> >> > ===========================================
>>> >> >> >> > What should happen to JIRA tickets still targeting
2.3.1?
>>> >> >> >> > ===========================================
>>> >> >> >> >
>>> >> >> >> > The current list of open tickets targeted at
2.3.1 can be
>>> found
>>> >> >> >> > at:
>>> >> >> >> > https://s.apache.org/Q3Uo
>>> >> >> >> >
>>> >> >> >> > Committers should look at those and triage. Extremely
>>> important
>>> >> >> >> > bug
>>> >> >> >> > fixes, documentation, and API tweaks that impact
compatibility
>>> >> >> >> > should
>>> >> >> >> > be worked on immediately. Everything else please
retarget to
>>> an
>>> >> >> >> > appropriate release.
>>> >> >> >> >
>>> >> >> >> > ==================
>>> >> >> >> > But my bug isn't fixed?
>>> >> >> >> > ==================
>>> >> >> >> >
>>> >> >> >> > In order to make timely releases, we will typically
not hold
>>> the
>>> >> >> >> > release unless the bug in question is a regression
from the
>>> >> >> >> > previous
>>> >> >> >> > release. That being said, if there is something
which is a
>>> >> >> >> > regression
>>> >> >> >> > that has not been correctly targeted please ping
me or a
>>> committer
>>> >> >> >> > to
>>> >> >> >> > help target the issue.
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > --
>>> >> >> >> > Marcelo
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >> Marcelo
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> ------------------------------------------------------------
>>> ---------
>>> >> >> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>> >> >> >>
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Marcelo
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Marcelo
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>

Mime
View raw message