hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <ey...@hortonworks.com>
Subject Re: [DISCUSS] Docker build process
Date Tue, 19 Mar 2019 17:19:28 GMT
Hi Marton,

Thank you for your input.  I agree with most of what you said with a few exceptions.  Security
fix should result in a different version of the image instead of replace of a certain version.
 Dockerfile is most likely to change to apply the security fix.  If it did not change, the
source has instability over time, and result in non-buildable code over time.  When maven
release is automated through Jenkins, this is a breeze of clicking a button.  Jenkins even
increment the target version automatically with option to edit.  It makes release manager's
job easier than Homer Simpson's job.

If versioning is done correctly, older branches can have the same docker subproject, and Hadoop
2.7.8 can be released for older Hadoop branches.  We don't generate timeline paradox to allow
changing the history of Hadoop 2.7.1.  That release has passed and let it stay that way.

There are mounting evidence that Hadoop community wants docker profile for developer image.
 Precommit build will not catch some build errors because more codes are allowed to slip through
using profile build process.  I will make adjustment accordingly unless 7 more people comes
out and say otherwise.


On 3/19/19, 1:18 AM, "Elek, Marton" <elek@apache.org> wrote:

    Thank you Eric to describe the problem.
    I have multiple small comments, trying to separate them.
    I. separated vs in-build container image creation
    > The disadvantages are:
    > 1.  Require developer to have access to docker.
    > 2.  Default build takes longer.
    These are not the only disadvantages (IMHO) as I wrote it in in the
    previous thread and the issue [1]
    Using in-build container image creation doesn't enable:
    1. to modify the image later (eg. apply security fixes to the container
    itself or apply improvements for the startup scripts)
    2. create images for older releases (eg. hadoop 2.7.1)
    I think there are two kind of images:
    a) images for released artifacts
    b) developer images
    I would prefer to manage a) with separated branch repositories but b)
    with (optional!) in-build process.
    II. Agree with Steve. I think it's better to make it optional as most of
    the time it's not required. I think it's better to support the default
    dev build with the default settings (=just enough to start)
    III. Maven best practices
    I think this is a good article. But this is not against profiles but
    creating multiple versions from the same artifact with the same name
    (eg. jdk8/jdk11). In Hadoop, profiles are used to introduce optional
    steps. I think it's fine as the maven lifecycle/phase model is very
    static (compare it with the tree based approach in Gradle).
    [1]: https://issues.apache.org/jira/browse/HADOOP-16091
    On 3/13/19 11:24 PM, Eric Yang wrote:
    > Hi Hadoop developers,
    > In the recent months, there were various discussions on creating docker build process
for Hadoop.  There was convergence to make docker build process inline in the mailing list
last month when Ozone team is planning new repository for Hadoop/ozone docker images.  New
feature has started to add docker image build process inline in Hadoop build.
    > A few lessons learnt from making docker build inline in YARN-7129.  The build environment
must have docker to have a successful docker build.  BUILD.txt stated for easy build environment
use Docker.  There is logic in place to ensure that absence of docker does not trigger docker
build.  The inline process tries to be as non-disruptive as possible to existing development
environment with one exception.  If docker’s presence is detected, but user does not have
rights to run docker.  This will cause the build to fail.
    > Now, some developers are pushing back on inline docker build process because existing
environment did not make docker build process mandatory.  However, there are benefits to use
inline docker build process.  The listed benefits are:
    > 1.  Source code tag, maven repository artifacts and docker hub artifacts can all
be produced in one build.
    > 2.  Less manual labor to tag different source branches.
    > 3.  Reduce intermediate build caches that may exist in multi-stage builds.
    > 4.  Release engineers and developers do not need to search a maze of build flags
to acquire artifacts.
    > The disadvantages are:
    > 1.  Require developer to have access to docker.
    > 2.  Default build takes longer.
    > There is workaround for above disadvantages by using -DskipDocker flag to avoid docker
build completely or -pl !modulename to bypass subprojects.
    > Hadoop development did not follow Maven best practice because a full Hadoop build
requires a number of profile and configuration parameters.  Some evolutions are working against
Maven design and require fork of separate source trees for different subprojects and pom files.
 Maven best practice (https://dzone.com/articles/maven-profile-best-practices) has explained
that do not use profile to trigger different artifact builds because it will introduce maven
artifact naming conflicts on maven repository using this pattern.  Maven offers flags to skip
certain operations, such as -DskipTests -Dmaven.javadoc.skip=true -pl or -DskipDocker.  It
seems worthwhile to make some corrections to follow best practice for Hadoop build.
    > Some developers have advocated for separate build process for docker images.  We
need consensus on the direction that will work best for Hadoop development community.  Hence,
my questions are:
    > Do we want to have inline docker build process in maven?
    > If yes, it would be developer’s responsibility to pass -DskipDocker flag to skip
docker.  Docker is mandatory for default build.
    > If no, what is the release flow for docker images going to look like?
    > Thank you for your feedback.
    > Regards,
    > Eric
    To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
    For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org

View raw message