hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elek, Marton" <e...@apache.org>
Subject [DISCUSS] making Ozone a separate Apache project
Date Wed, 13 May 2020 07:52:36 GMT

I would like to start a discussion to make a separate Apache project for 

### HISTORY [1]

  * Apache Hadoop Ozone development started on a feature branch of 
Hadoop repository (HDFS-7240)

  * In the October of 2017 a discussion has been started to merge it to 
the Hadoop main branch

  * After a long discussion it's merged to Hadoop trunk at the March of 2018

  * During the discussion of the merge, it was suggested multiple times 
to create a separated project for the Ozone. But at that time:
     1). Ozone was tightly integrated with Hadoop/HDFS
     2). There was an active plan to use Block layer of Ozone (HDDS or 
HDSL at that time) as the block level of HDFS
     3). The community of Ozone was a subset of the HDFS community

  * The first beta release of Ozone was just released. Seems to be a 
good time before the first GA to make a decision about the future.


  During the last years Ozone became more and more independent both at 
the community and code side. The separation has been suggested again and 
again (for example by Owen [2] and Vinod [3])

  From COMMUNITY point of view:

   * Fortunately more and more new contributors are helping Ozone. 
Originally the Ozone community was a subset of HDFS project. But now a 
bigger and bigger part of the community is related to Ozone only.

   * It seems to be easier to _build_ the community as a separated project.

   * A new, younger project might have different practices 
(communication, commiter criteria, development style) compared to old, 
mature project

   * It's easier to communicate (and improve) these standards in a 
separated projects with clean boundaries

   * Separated project/brand can help to increase the adoption rate and 
attract more individual contributor (AFAIK it has been seen in Submarine 
after a similar move)

  * Contribution process can be communicated more easily, we can make 
first time contribution more easy

  From CODE point of view Ozone became more and more independent:

  * Ozone has different release cycle

  * Code is already separated from Hadoop code base 

  * It has separated CI (github actions)

  * Ozone uses different (more strict) coding style (zero toleration of 
unit test / checkstyle errors)

  * The code itself became more and more independent from Hadoop on 
Maven level. Originally it was compiled together with the in-tree latest 
Hadoop snapshot. Now it depends on released Hadoop artifacts (RPC, 

  * It starts to use multiple version of Hadoop (on client side)

  * Volume of resolved issues are already very high on Ozone side (Ozone 
had slightly more resolved issues than HDFS/YARN/MAPREDUCE/COMMON all 
together in the last 2-3 months)

Summary: Before the first Ozone GA release, It seems to be a good time 
to discuss the long-term future of Ozone. Managing it as a separated TLP 
project seems to have more benefits.

Please let me know what your opinion is...

Thanks a lot,

[1]: For more details, see: 



To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org

View raw message