hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei-Chiu Chuang <weic...@apache.org>
Subject Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts
Date Wed, 09 Oct 2019 22:00:15 GMT
Hi I am late to this but I am keen to understand more.

To be exact, how can we better use the thirdparty repo? Looking at HBase as
an example, it looks like everything that are known to break a lot after an
update get shaded into the hbase-thirdparty artifact: guava, netty, ... etc.
Is it the purpose to isolate these naughty dependencies?

On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vinayakumarb@apache.org>
wrote:

> Hi All,
>
> I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com>
> 's suggestions.
>
>     i. Renamed the module to 'hadoop-shaded-protobuf37'
>     ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>
> Please review!!
>
> Thanks,
> -Vinay
>
>
> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <palomino219@gmail.com>
> wrote:
>
> > For HBase we have a separated repo for hbase-thirdparty
> >
> > https://github.com/apache/hbase-thirdparty
> >
> > We will publish the artifacts to nexus so we do not need to include
> > binaries in our git repo, just add a dependency in the pom.
> >
> >
> >
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >
> >
> > And it has its own release cycles, only when there are special
> requirements
> > or we want to upgrade some of the dependencies. This is the vote thread
> for
> > the newest release, where we want to provide a shaded gson for jdk7.
> >
> >
> >
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >
> >
> > Thanks.
> >
> > Vinayakumar B <vinayakumarb@apache.org> 于2019年9月28日周六 上午1:28写道:
> >
> > > Please find replies inline.
> > >
> > > -Vinay
> > >
> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <owen.omalley@gmail.com
> >
> > > wrote:
> > >
> > > > I'm very unhappy with this direction. In particular, I don't think
> git
> > is
> > > > a good place for distribution of binary artifacts. Furthermore, the
> PMC
> > > > shouldn't be releasing anything without a release vote.
> > > >
> > > >
> > > Proposed solution doesnt release any binaries in git. Its actually a
> > > complete sub-project which follows entire release process, including
> VOTE
> > > in public. I have mentioned already that release process is similar to
> > > hadoop.
> > > To be specific, using the (almost) same script used in hadoop to
> generate
> > > artifacts, sign and deploy to staging repository. Please let me know
> If I
> > > am conveying anything wrong.
> > >
> > >
> > > > I'd propose that we make a third party module that contains the
> > *source*
> > > > of the pom files to build the relocated jars. This should absolutely
> be
> > > > treated as a last resort for the mostly Google projects that
> regularly
> > > > break binary compatibility (eg. Protobuf & Guava).
> > > >
> > > >
> > > Same has been implemented in the PR
> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and
> let
> > > me
> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
> > >
> > >
> > > > In terms of naming, I'd propose something like:
> > > >
> > > > org.apache.hadoop.thirdparty.protobuf2_5
> > > > org.apache.hadoop.thirdparty.guava28
> > > >
> > > > In particular, I think we absolutely need to include the version of
> the
> > > > underlying project. On the other hand, since we should not be shading
> > > > *everything* we can drop the leading com.google.
> > > >
> > > >
> > > IMO, This naming convention is easy for identifying the underlying
> > project,
> > > but  it will be difficult to maintain going forward if underlying
> project
> > > versions changes. Since thirdparty module have its own releases, each
> of
> > > those release can be mapped to specific version of underlying project.
> > Even
> > > the binary artifact can include a MANIFEST with underlying project
> > details
> > > as per Steve's suggestion on HADOOP-13363.
> > > That said, if you still prefer to have project number in artifact id,
> it
> > > can be done.
> > >
> > > The Hadoop project can make releases of  the thirdparty module:
> > > >
> > > > <dependency>
> > > >   <groupId>org.apache.hadoop</groupId>
> > > >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > > >   <version>1.0</version>
> > > > </dependency>
> > > >
> > > >
> > > Note that the version has to be the hadoop thirdparty release number,
> > which
> > > > is part of why you need to have the underlying version in the
> artifact
> > > > name. These we can push to maven central as new releases from Hadoop.
> > > >
> > > >
> > > Exactly, same has been implemented in the PR. hadoop-thirdparty module
> > have
> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> > > differentiated using prefix "thirdparty-".
> > >
> > > Same solution is being followed in HBase. May be people involved in
> HBase
> > > can add some points here.
> > >
> > > Thoughts?
> > > >
> > > > .. Owen
> > > >
> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> vinayakumarb@apache.org
> > >
> > > > wrote:
> > > >
> > > >> Hi All,
> > > >>
> > > >>    I wanted to discuss about the separate repo for thirdparty
> > > dependencies
> > > >> which we need to shaded and include in Hadoop component's jars.
> > > >>
> > > >>    Apologies for the big text ahead, but this needs clear
> > explanation!!
> > > >>
> > > >>    Right now most needed such dependency is protobuf. Protobuf
> > > dependency
> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> > > builds,
> > > >> which depends on transitive dependency protobuf coming from hadoop's
> > > jars,
> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
> > source
> > > >> compatibility, though it guarantees wire compatibility between
> > versions.
> > > >> Because of this behavior, version upgrade may cause breakage in
> known
> > > and
> > > >> unknown (private?) downstreams.
> > > >>
> > > >>    So to tackle this, we came up the following proposal in
> > HADOOP-13363.
> > > >>
> > > >>    Luckily, As far as I know, no APIs, either public to user or
> > between
> > > >> Hadoop processes, is not directly using protobuf classes in
> > signatures.
> > > >> (If
> > > >> any exist, please let us know).
> > > >>
> > > >>    Proposal:
> > > >>    ------------
> > > >>
> > > >>    1. Create a artifact(s) which contains shaded dependencies. All
> > such
> > > >> shading/relocation will be with known prefix
> > > >> **org.apache.hadoop.thirdparty.**.
> > > >>    2. Right now protobuf jar (ex:
> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > > >> to start with, all **com.google.protobuf** classes will be relocated
> > as
> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > > >>    3. Hadoop modules, which needs protobuf as dependency, will add
> > this
> > > >> shaded artifact as dependency (ex:
> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > > >>    4. All previous usages of "com.google.protobuf" will be relocated
> > to
> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
> > will
> > > be
> > > >> committed. Please note, this replacement is One-Time directly in
> > source
> > > >> code, NOT during compile and package.
> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> > hadoop
> > > >> dont care about which version of original  "protobuf-java" is in
> > > >> dependency.
> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> break
> > > the
> > > >> downstreams. But hadoop will be originally using the latest protobuf
> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > > >>
> > > >>    7. Coming back to separate repo, Following are most appropriate
> > > reasons
> > > >> of keeping shaded dependency artifact in separate repo instead of
> > > >> submodule.
> > > >>
> > > >>       7a. These artifacts need not be built all the time. It needs
> to
> > be
> > > >> built only when there is a change in the dependency version or the
> > build
> > > >> process.
> > > >>       7b. If added as "submodule in Hadoop repo",
> > > maven-shade-plugin:shade
> > > >> will execute only in package phase. That means, "mvn compile" or
> "mvn
> > > >> test-compile" will not be failed as this artifact will not have
> > > relocated
> > > >> classes, instead it will have original classes, resulting in
> > compilation
> > > >> failure. Workaround, build thirdparty submodule first and exclude
> > > >> "thirdparty" submodule in other executions. This will be a complex
> > > process
> > > >> compared to keeping in a separate repo.
> > > >>
> > > >>       7c. Separate repo, will be a subproject of Hadoop, using the
> > same
> > > >> HADOOP jira project, with different versioning prefixed with
> > > "thirdparty-"
> > > >> (ex: thirdparty-1.0.0).
> > > >>       7d. Separate will have same release process as Hadoop.
> > > >>
> > > >>     HADOOP-13363 (
> https://issues.apache.org/jira/browse/HADOOP-13363)
> > > is
> > > >> an
> > > >> umbrella jira tracking the changes to protobuf upgrade.
> > > >>
> > > >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> been
> > > >> raised
> > > >> for separate repo creation in (HADOOP-16595 (
> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > > >>
> > > >>     Please provide your inputs for the proposal and review the PR
to
> > > >> proceed with the proposal.
> > > >>
> > > >>
> > > >    -Thanks,
> > > >>     Vinay
> > > >>
> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > > >> vinodkv@apache.org>
> > > >> wrote:
> > > >>
> > > >> > Moving the thread to the dev lists.
> > > >> >
> > > >> > Thanks
> > > >> > +Vinod
> > > >> >
> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > > vinayakumarb@apache.org>
> > > >> > wrote:
> > > >> > >
> > > >> > > Thanks Marton,
> > > >> > >
> > > >> > > Current created 'hadoop-thirdparty' repo is empty right
now.
> > > >> > > Whether to use that repo  for shaded artifact or not will
be
> > > >> monitored in
> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> > discussion.
> > > >> > >
> > > >> > > There is no existing codebase is being moved out of hadoop
repo.
> > So
> > > I
> > > >> > think
> > > >> > > right now we are good to go.
> > > >> > >
> > > >> > > -Vinay
> > > >> > >
> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <elek@apache.org>
> > > wrote:
> > > >> > >
> > > >> > >>
> > > >> > >> I am not sure if it's defined when is a vote required.
> > > >> > >>
> > > >> > >> https://www.apache.org/foundation/voting.html
> > > >> > >>
> > > >> > >> Personally I think it's a big enough change to send
a
> > notification
> > > to
> > > >> > the
> > > >> > >> dev lists with a 'lazy consensus'  closure
> > > >> > >>
> > > >> > >> Marton
> > > >> > >>
> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <vinayakumarb@apache.org
> >
> > > >> wrote:
> > > >> > >>> Hi,
> > > >> > >>>
> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and
may be
> more
> > in
> > > >> > >> future)
> > > >> > >>> will be kept as a shaded artifact in a separate
repo, which
> will
> > > be
> > > >> > >>> referred as dependency in hadoop modules.  This
approach
> avoids
> > > >> shading
> > > >> > >> of
> > > >> > >>> every submodule during build.
> > > >> > >>>
> > > >> > >>> So question is does any VOTE required before asking
to create
> a
> > > git
> > > >> > repo?
> > > >> > >>>
> > > >> > >>> On selfserve platform
> > > https://gitbox.apache.org/setup/newrepo.html
> > > >> > >>> I can access see that, requester should be PMC.
> > > >> > >>>
> > > >> > >>> Wanted to confirm here first.
> > > >> > >>>
> > > >> > >>> -Vinay
> > > >> > >>>
> > > >> > >>
> > > >> > >>
> > > ---------------------------------------------------------------------
> > > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> > > >> > >> For additional commands, e-mail:
> private-help@hadoop.apache.org
> > > >> > >>
> > > >> > >>
> > > >> >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message