hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinayakumar B <vinayakum...@apache.org>
Subject Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts
Date Wed, 09 Oct 2019 19:38:35 GMT
Hi All,

I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com>
's suggestions.

    i. Renamed the module to 'hadoop-shaded-protobuf37'
    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'

Please review!!

Thanks,
-Vinay


On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <palomino219@gmail.com>
wrote:

> For HBase we have a separated repo for hbase-thirdparty
>
> https://github.com/apache/hbase-thirdparty
>
> We will publish the artifacts to nexus so we do not need to include
> binaries in our git repo, just add a dependency in the pom.
>
>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>
>
> And it has its own release cycles, only when there are special requirements
> or we want to upgrade some of the dependencies. This is the vote thread for
> the newest release, where we want to provide a shaded gson for jdk7.
>
>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>
>
> Thanks.
>
> Vinayakumar B <vinayakumarb@apache.org> 于2019年9月28日周六 上午1:28写道:
>
> > Please find replies inline.
> >
> > -Vinay
> >
> > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <owen.omalley@gmail.com>
> > wrote:
> >
> > > I'm very unhappy with this direction. In particular, I don't think git
> is
> > > a good place for distribution of binary artifacts. Furthermore, the PMC
> > > shouldn't be releasing anything without a release vote.
> > >
> > >
> > Proposed solution doesnt release any binaries in git. Its actually a
> > complete sub-project which follows entire release process, including VOTE
> > in public. I have mentioned already that release process is similar to
> > hadoop.
> > To be specific, using the (almost) same script used in hadoop to generate
> > artifacts, sign and deploy to staging repository. Please let me know If I
> > am conveying anything wrong.
> >
> >
> > > I'd propose that we make a third party module that contains the
> *source*
> > > of the pom files to build the relocated jars. This should absolutely be
> > > treated as a last resort for the mostly Google projects that regularly
> > > break binary compatibility (eg. Protobuf & Guava).
> > >
> > >
> > Same has been implemented in the PR
> > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and let
> > me
> > know If I misunderstood. Yes, this is the last option we have AFAIK.
> >
> >
> > > In terms of naming, I'd propose something like:
> > >
> > > org.apache.hadoop.thirdparty.protobuf2_5
> > > org.apache.hadoop.thirdparty.guava28
> > >
> > > In particular, I think we absolutely need to include the version of the
> > > underlying project. On the other hand, since we should not be shading
> > > *everything* we can drop the leading com.google.
> > >
> > >
> > IMO, This naming convention is easy for identifying the underlying
> project,
> > but  it will be difficult to maintain going forward if underlying project
> > versions changes. Since thirdparty module have its own releases, each of
> > those release can be mapped to specific version of underlying project.
> Even
> > the binary artifact can include a MANIFEST with underlying project
> details
> > as per Steve's suggestion on HADOOP-13363.
> > That said, if you still prefer to have project number in artifact id, it
> > can be done.
> >
> > The Hadoop project can make releases of  the thirdparty module:
> > >
> > > <dependency>
> > >   <groupId>org.apache.hadoop</groupId>
> > >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > >   <version>1.0</version>
> > > </dependency>
> > >
> > >
> > Note that the version has to be the hadoop thirdparty release number,
> which
> > > is part of why you need to have the underlying version in the artifact
> > > name. These we can push to maven central as new releases from Hadoop.
> > >
> > >
> > Exactly, same has been implemented in the PR. hadoop-thirdparty module
> have
> > its own releases. But in HADOOP Jira, thirdparty versions can be
> > differentiated using prefix "thirdparty-".
> >
> > Same solution is being followed in HBase. May be people involved in HBase
> > can add some points here.
> >
> > Thoughts?
> > >
> > > .. Owen
> > >
> > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vinayakumarb@apache.org
> >
> > > wrote:
> > >
> > >> Hi All,
> > >>
> > >>    I wanted to discuss about the separate repo for thirdparty
> > dependencies
> > >> which we need to shaded and include in Hadoop component's jars.
> > >>
> > >>    Apologies for the big text ahead, but this needs clear
> explanation!!
> > >>
> > >>    Right now most needed such dependency is protobuf. Protobuf
> > dependency
> > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> > builds,
> > >> which depends on transitive dependency protobuf coming from hadoop's
> > jars,
> > >> may fail with the upgrade. Apparently protobuf does not guarantee
> source
> > >> compatibility, though it guarantees wire compatibility between
> versions.
> > >> Because of this behavior, version upgrade may cause breakage in known
> > and
> > >> unknown (private?) downstreams.
> > >>
> > >>    So to tackle this, we came up the following proposal in
> HADOOP-13363.
> > >>
> > >>    Luckily, As far as I know, no APIs, either public to user or
> between
> > >> Hadoop processes, is not directly using protobuf classes in
> signatures.
> > >> (If
> > >> any exist, please let us know).
> > >>
> > >>    Proposal:
> > >>    ------------
> > >>
> > >>    1. Create a artifact(s) which contains shaded dependencies. All
> such
> > >> shading/relocation will be with known prefix
> > >> **org.apache.hadoop.thirdparty.**.
> > >>    2. Right now protobuf jar (ex:
> > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > >> to start with, all **com.google.protobuf** classes will be relocated
> as
> > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > >>    3. Hadoop modules, which needs protobuf as dependency, will add
> this
> > >> shaded artifact as dependency (ex:
> > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > >>    4. All previous usages of "com.google.protobuf" will be relocated
> to
> > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
> will
> > be
> > >> committed. Please note, this replacement is One-Time directly in
> source
> > >> code, NOT during compile and package.
> > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> hadoop
> > >> dont care about which version of original  "protobuf-java" is in
> > >> dependency.
> > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break
> > the
> > >> downstreams. But hadoop will be originally using the latest protobuf
> > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > >>
> > >>    7. Coming back to separate repo, Following are most appropriate
> > reasons
> > >> of keeping shaded dependency artifact in separate repo instead of
> > >> submodule.
> > >>
> > >>       7a. These artifacts need not be built all the time. It needs to
> be
> > >> built only when there is a change in the dependency version or the
> build
> > >> process.
> > >>       7b. If added as "submodule in Hadoop repo",
> > maven-shade-plugin:shade
> > >> will execute only in package phase. That means, "mvn compile" or "mvn
> > >> test-compile" will not be failed as this artifact will not have
> > relocated
> > >> classes, instead it will have original classes, resulting in
> compilation
> > >> failure. Workaround, build thirdparty submodule first and exclude
> > >> "thirdparty" submodule in other executions. This will be a complex
> > process
> > >> compared to keeping in a separate repo.
> > >>
> > >>       7c. Separate repo, will be a subproject of Hadoop, using the
> same
> > >> HADOOP jira project, with different versioning prefixed with
> > "thirdparty-"
> > >> (ex: thirdparty-1.0.0).
> > >>       7d. Separate will have same release process as Hadoop.
> > >>
> > >>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363)
> > is
> > >> an
> > >> umbrella jira tracking the changes to protobuf upgrade.
> > >>
> > >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
> > >> raised
> > >> for separate repo creation in (HADOOP-16595 (
> > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > >>
> > >>     Please provide your inputs for the proposal and review the PR to
> > >> proceed with the proposal.
> > >>
> > >>
> > >    -Thanks,
> > >>     Vinay
> > >>
> > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > >> vinodkv@apache.org>
> > >> wrote:
> > >>
> > >> > Moving the thread to the dev lists.
> > >> >
> > >> > Thanks
> > >> > +Vinod
> > >> >
> > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > vinayakumarb@apache.org>
> > >> > wrote:
> > >> > >
> > >> > > Thanks Marton,
> > >> > >
> > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> > >> > > Whether to use that repo  for shaded artifact or not will be
> > >> monitored in
> > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> discussion.
> > >> > >
> > >> > > There is no existing codebase is being moved out of hadoop repo.
> So
> > I
> > >> > think
> > >> > > right now we are good to go.
> > >> > >
> > >> > > -Vinay
> > >> > >
> > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <elek@apache.org>
> > wrote:
> > >> > >
> > >> > >>
> > >> > >> I am not sure if it's defined when is a vote required.
> > >> > >>
> > >> > >> https://www.apache.org/foundation/voting.html
> > >> > >>
> > >> > >> Personally I think it's a big enough change to send a
> notification
> > to
> > >> > the
> > >> > >> dev lists with a 'lazy consensus'  closure
> > >> > >>
> > >> > >> Marton
> > >> > >>
> > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <vinayakumarb@apache.org>
> > >> wrote:
> > >> > >>> Hi,
> > >> > >>>
> > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
be more
> in
> > >> > >> future)
> > >> > >>> will be kept as a shaded artifact in a separate repo,
which will
> > be
> > >> > >>> referred as dependency in hadoop modules.  This approach
avoids
> > >> shading
> > >> > >> of
> > >> > >>> every submodule during build.
> > >> > >>>
> > >> > >>> So question is does any VOTE required before asking to
create a
> > git
> > >> > repo?
> > >> > >>>
> > >> > >>> On selfserve platform
> > https://gitbox.apache.org/setup/newrepo.html
> > >> > >>> I can access see that, requester should be PMC.
> > >> > >>>
> > >> > >>> Wanted to confirm here first.
> > >> > >>>
> > >> > >>> -Vinay
> > >> > >>>
> > >> > >>
> > >> > >>
> > ---------------------------------------------------------------------
> > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> > >> > >> For additional commands, e-mail: private-help@hadoop.apache.org
> > >> > >>
> > >> > >>
> > >> >
> > >> >
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message