helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jiajunw...@apache.org
Subject [helix] branch master updated: Merge Waged rebalancer branch code to master. (#724)
Date Wed, 05 Feb 2020 21:09:14 GMT
This is an automated email from the ASF dual-hosted git repository.

jiajunwang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/helix.git


The following commit(s) were added to refs/heads/master by this push:
     new bee3ed2  Merge Waged rebalancer branch code to master. (#724)
bee3ed2 is described below

commit bee3ed273424b9614082be8c76eb80428a97ff3a
Author: Jiajun Wang <1803880+jiajunwang@users.noreply.github.com>
AuthorDate: Wed Feb 5 13:09:04 2020 -0800

    Merge Waged rebalancer branch code to master. (#724)
    
    * Define the WAGED rebalancer interfaces.
    
    This is the intial check in for the future development of the WAGED rebalancer.
    All the components are placeholders. They will be implemented gradually.
    
    * Adding the configuration items of the WAGED rebalancer. (#348)
    
    * Adding the configuration items of the WAGED rebalancer.
    
    Including: Instance Capacity Keys, Rebalance Preferences, Instance Capacity Details, Partition Capacity (the weight) Details.
    Also adding test to cover the new configuration items.
    
    * Implement the WAGED rebalancer cluster model (#362)
    
    * Introduce the cluster model classes to support the WAGED rebalancer.
    
    Implement the cluster model classes with the minimum necessary information to support rebalance.
    Additional field/logics might be added later once the detailed rebalance logic is implemented.
    
    Also add related tests.
    
    * Change the rebalancer assignment record to be ResourceAssignment instead of IdealState. (#398)
    
    ResourceAssignment fit the usage better. And there will be no unnecessary information to be recorded or read during the rebalance calculation.
    
    * Convert all the internal assignment state objects to be ResourceAssignment. (#399)
    
    This is to avoid unnecessary information being recorded or read.
    
    * Implement Cluster Model Provider. (#392)
    
    * Implement Cluster Model Provider.
    
    The model provider is called in the WAGED rebalancer to generate CLuster Model based on the current cluster status.
    The major responsibility of the provider is to parse all the assignable replicas and identify which replicas need to be reassigned. Note that if the current best possible assignment is still valid, the rebalancer won't need to calculate for the partition assignment.
    
    Also, add unit tests to verify the main logic.
    
    * Add ChangeDetector interface and ResourceChangeDetector implementation (#388)
    
    Add ChangeDetector interface and ResourceChangeDetector implementation
    
    In order to efficiently react to changes happening to the cluster in the new WAGED rebalancer, a new component called ChangeDetector was added.
    
    Changelist:
    1. Add ChangeDetector interface
    2. Implement ResourceChangeDetector
    3. Add ResourceChangeCache, a wrapper for critical cluster metadata
    4. Add an integration test, TestResourceChangeDetector
    
    * Add cluster level default instance config. (#413)
    
    This config will be applied to the instance when there is no (or empty) capacity configuration in the Instance Config.
    Also add unit tests.
    
    * Redefine the hard/soft constraints (#422)
    
    * Refactor the interfaces of hard/soft constraints and a central place to keep the softConstraint weights
    
    * Refine the WAGED rebalancer related interfaces for integration (#431)
    
    * Refine the WAGED rebalancer related interfaces and initial integrate with the BestPossibleStateCalStage.
    
    - Modify the BestPossibleStateCalStage logic to plugin the WAGED rebalancer.
    - Refine ClusterModel to integrate with the ClusterDataDetector implementation.
    - Enabling getting the changed details for Cluster Config in the change detector. Which is required by the WAGED rebalancer.
    
    * Resubmit the change: Refine the WAGED rebalancer related interfaces for integration (#431)
    
    * Refine the WAGED rebalancer related interfaces and initial integrate with the BestPossibleStateCalStage.
    
    - Modify the BestPossibleStateCalStage logic to plugin the WAGED rebalancer.
    - Refine ClusterModel to integrate with the ClusterDataDetector implementation.
    - Enabling getting the changed details for Cluster Config in the change detector. Which is required by the WAGED rebalancer.
    
    * Bring back the interface class and algorithm placeholder class that was removed prematurely.
    
    * Revert "Refine the WAGED rebalancer related interfaces for integration (#431)" (#437)
    
    This reverts commit 08a2015c617ddd3c93525afc572081a7836f9476.
    
    * Modify the expected change type from CONFIG to CLUSTER_CONFIG in the WAGED rebalancer. (#438)
    
    CONFIG is for generic configuration items. That will be too generic for the rebalancer.
    Modify to check for CLUSTER_CONFIG to avoid confusion.
    
    * Add special treatment for ClusterConfig
    
    This diff allows callers of getChangeType to iterate over the result of getChangeType() by changing determinePropertyMapByType so that it just returns an empty map for ClusterConfig.
    
    * Record the replica objects in the AssignableNode in addition to the partition name (#440)
    
    The replica instances are required while the rebalance algorithm generating ResourceAssignment based on the AssignableNode instances.
    Refine the methods of the AssignableNode for better code style and readability.
    Also, modify the related test cases to verify state information and new methods.
    
    * Add BucketDataAccessor for large writes
    
    For the new WAGED rebalancer, it's necessary to have a data accessor that will allow writes of data exceeding 1MB. ZooKeeper's ZNode size is capped at 1MB, so BucketDataAccessor interface and ZkBucketDataAccessor help us achieve this.
    Changelist:
    1. Add BucketDataAccessor and ZkBucketDataAccessor
    2. Add necessary serializers
    3. Add an integration test against ZK
    
    * Implement the basic constraint based algorithm (#381)
    
    Implement basic constraint algorithm: Greedy based, each time it picks the best scores given each replica and assigns the replica to the node. It doesn't guarantee to achieve global optimal but local optimal result
    
    The algorithm is based on a given set of constraints
    
    * HardConstraint: Approve or deny the assignment given its condition, any assignment cannot bypass any "hard constraint"
    * SoftConstraint: Evaluate the assignment by points/rewards/scores, a higher point means a better assignment
    The goal is to avoid all "hard constraints" while accumulating the most points(rewards) from "soft constraints"
    
    * Validate the instance capacity/partition weight configuration while constructing the assignable instances (#451)
    
    Compare the configure items with the required capacity keys that are defined in the cluster config when build the assignable instances.
    - According to the design, all the required capacity keys must appear in the instance capacity config.
    - As for the partition weights, the corresponding weight item will be filled with value 0 if the required capacity key is not specified in the resource config.
    
    * Implement the WAGED rebalancer with the limited functionality. (#443)
    
    The implemented rebalancer supports basic rebalance logic. It does not contain the logic to support delayed rebalance and user-defined preference list.
    
    Added unit test to cover the main workflow of the WAGED rebalancer.
    
    * HardConstraints Implementation and unit tests (#433)
    
    * Implement all of basic Hard Constraints
    1. Partitions count cannot exceed instance's upper limit
    2. Fault zone aware (no same partitions on the same zone)
    3. Partitions weight cannot exceed instance's capacity
    4. Cannot assign inactived partitions
    5. Same partition of different states cannot co-exist in one instance
    6. Instance doesn't have the tag of the replica
    
    * Implement AssignmentMetadataStore (#453)
    
    Implement AssignmentMetadataStore
    
    AssignmentMetadataStore is a component for the new WAGED Rebalaner. It provides APIs that allows the rebalancer to read and write the baseline and best possible assignments using BucketDataAccessor.
    
    Changelist:
    1. Add AssignmentMetadataStore
    2. Add an integration test: TestAssignmentMetadataStore
    
    * Fix TestWagedRebalancer and add constructor in AssignmentMetadataStore
    
    TestWagedRebalancer was failing because it was not using a proper HelixManager to instantiate a mock version of AssignmentMetadataStore. This diff refactors the constructors in AssignmentMetadataStore and fixes the failing test.
    
    * Implement one of the soft constraints (#450)
    
    Implement Instance Partitions Count soft constraint.
    Evaluate by instance's current partition count versus estimated max partition count.
    Intuitively, Encourage the assignment if the instance's occupancy rate is below average;
    Discourage the assignment if the instance's occupancy rate is above average.
    
    The final normalized score will be within [0, 1].
    The implementation of the class will depend on the cluster current total partitions count as the max score.
    
    * Add soft constraint: ResourcetopStateAntiAffinityConstraint (#465)
    
    Add ResourcetopStateAntiAffinityConstraint
    
    The more total top state partitions assigned to the instance, the lower the score, vice versa.
    
    * Implement MaxCapacityUsageInstanceConstraint soft constraint (#463)
    
    The constraint evaluates the score by checking the max used capacity key out of all the capacity
    keys.
    The higher the maximum usage value for the capacity key, the lower the score will be, implying
    that it is that much less desirable to assign anything on the given node.
    It is a greedy approach since it evaluates only the most used capacity key.
    
    * Add soft constraint: ResourcePartitionAntiAffinityConstraint (#464)
    
    If the resource of the partition overall has a light load on the instance, the score is higher compared to case when the resource is heavily loaded on the instance
    
    * Improve ResourceTopStateAntiAffinityConstraint (#475)
    
    - fix the min max range to be [0,1]
    - add unit test for normalized score
    
    * Adjust the expected replica count according to fault zone count. (#476)
    
    The rebalancer should determine the expected replica count according to the fault zone instead of the node count only.
    
    * PartitionMovementSoftConstraint Implementation (#474)
    
    Add soft constraint: partition movement constraint
    
    Evaluate the proposed assignment according to the potential partition movements cost.
    The cost is evaluated based on the difference between the old assignment and the new assignment.
    
    * Add the remaining implementation of ConstraintBasedAlgorithmFactory (#478)
    
    Implementation of ConstraintBasedAlgorithmFactory and the soft constraint weight model.
    Remove SoftConstraintWeightModel class.
    Get the rebalance preference and adjust the corresponding weight.
    Pass the preference keys instead of cluster config.
    
    * Integrate the WAGED rebalancer with all the related components. (#466)
    
    1. Integrate with the algorithm, assignment metadata store, etc. Fix several conflicting interfaces and logics so as to all the rebalancer run correctly.
    2. Complete OptimalAssignment.
    3. Add integration tests to ensure the correctness of rebalancing logic.
    
    * Separate AssignableNode properties by Immutable and Mutable (#485)
    
    Make AssignableNode properties different by Immutable and Mutable
    - It helps detect any wrong usage of these properties early
    
    * Enable maintenance mode for the WAGED rebalancer.
    
    The maintenance mode rebalance logic keeps the same as the previous feature.
    Add more tests about partition migration and node swap that requires maintenance mode.
    
    * Add delayed rebalance and user-defined preference list features to the WAGED rebalancer. (#456)
    
    - Add delayed rebalance and user-defined preference list features to the WAGED rebalancer.
    - Refine the delayed rebalance usage in the waged rebalancer.
    - Add the delayed rebalance scheduling logic.
    - Add the necessary tests. And fix TestMixedModeAutoRebalance and all delayed rebalance tests.
    
    * Adjust the topology processing logic for instance to ensure backward compatibility.
    
    * Load soft constraint weight from resources/properties file (#492)
    
    Load the soft constraint's weight from a properties file.
    It makes easier for us to adjust weights in the future.
    
    * Add latency metric components for WAGED rebalancer (#490)
    
    Add WAGED rebalancer metric framework and latency metric implementation
    
    Changelist:
    1. Add WAGED rebalancer metric interface
    2. Implement latency-related metrics
    3. Integrate latency metrics into WAGED rebalancer
    4. Add tests
    
    * Fixing rebalance cache issue and stablize the tests. (#510)
    
    1. Fix the DelayedAutoRebalancer Cache issue that ClusterConfig change won't trigger rebalance. The current workaround in our code blocks the WAGED rebalancer logic. So we need to fix it while merging the WAGED rebalancer code.
    2. Refine the ResourceChangeDetector's usage in the WAGED rebalancer so as to avoid unnecessary global rebalance.
    3. Extend the StrictMatchExternalViewVerifier so it can be used to test the WAGED rebalance feature.
    
    * More strict partition weight validation while creating the cluster model. (#511)
    
    1. If any capacity key is not configured in the Resource Config (or default weight) as the partition weight, the config is invalid.
    2. If any partition weight is configured with a negative number, the config is invalid.
    Note that the rebalancer will not compute a new assignment if any capacity/weight config is invalid.
    
    * Increase parallelism for ZkBucketDataAccessor (#506)
    
    * Increase parallelism for ZkBucketDataAccessor
    
    This diff improves parallelism and throughput for ZkBucketDataAccessor. It implements the following ideas:
    1. Optimistic Concurrency Control
    2. Monotonically Increasing Version Number
    3. Garbage Collection of Stale Metadata
    4. Retrying Reads Upon Failure
    
    * The WAGED rebalancer returns the previously calculated assignment on calculation failure (#514)
    
    * The WAGED rebalancer returns the previously calculated assignment on calculation failure.
    
    This is to protect the cluster assignment on a rebalancing algorithm failure. For example, the cluster is out of capacity. In this case, the rebalancer will keep using the previously calculated mapping.
    Also, refine the new metric interface, and add the RebalanceFailureCount metric for recording the failures.
    
    Modify the test cases so that DBs from different test cases have a different name. This is to avoid previous test records to be returned by the rebalancer on calculation error.
    
    * Make log clearer after finishing calculateAssignment. (#531)
    
    Make log clearer after finishing calculateAssignment.
    
    * Implement monitoring mbeans for the WAGED rebalancer. (#525)
    
    Change list:
    1. GlobalBaselineCalcCounter: Counter of the global rebalance.
    2. PartialRebalanceCounter: Counter of the partial rebalance done.
    3. BaselineDivergenceGauge: Gauge of the difference at replica level between the Baseline and the Best Possible assignments.
    
    * Refine the rebalance scope calculating logic in the WAGED rebalancer. (#519)
    
    * Refine the rebalane scope calculating logic in the WAGED rebalancer.
    
    1. Ignore the IdealState mapping/listing fields if the resource is in FULL_AUTO mode.
    2. On IdealState change, the resource shall be fully rebalanced since some filter conditions might be changed. Such as instance tag.
    3. Live instance change (node newly connected) shall trigger full rebalance so partitions will be re-assigned to the new node.
    4. Modify the related test cases.
    5. Adding an option to the change detector so if it is used elsewhere, the caller has an option to listen to any change.
    
    * Make WagedRebalancer static by creating a ThreadLocal (#540)
    
    ZKBucketDataAccessor has a GC logic, but this is only valid if the ZkClient inside it is active and not closed. Currently, WAGED rebalancer generates an instance of AssignmentMetadataStore every time it rebalances, which does not allow the internal ZkBucketDataAccessor to garbage collect the assignment metadata it wrote previously.
    
    This diff makes the entire WagedRebalancer object a ThreadLocal, which has the effect of making it essentially static across different runs of the pipeline.
    
    * Change change detector to a regular field in the WAGED rebalancer instead of static threadlocal. (#543)
    
    * Change change detector to regular field instead of static thread-local.
    
    The rebalance has been modified to be a thread-local object. So there is no need to keep the change detector as thread-local.
    This may cause potential problems.
    In addition, in order to avoid resource leakage, implement the finalize method of the WagedRebalancer to close all connections.
    
    * Refactor soft constraints to simply the algorithm and fix potential issues. (#520)
    
    * Refactor soft constraints to simply the algorithm and fix potential issues.
    
    1. Check for zero weight so as to avoid unnecessary calculations.
    2. Simply the soft constraint interfaces and implementations. Avoid duplicate code.
    3. Adjust partition movements constraint logic to reduce the chance of moving partition when the baseline and best possible assignment diverge.
    4. Estimate utilization in addition to the other usage estimation. The estimation will be used as a base when calculating the capacity usage score. This is to ensure the algorithm treats different clusters with different overall usage in the same way.
    5. Fix the issue that high utilization calculation does not consider the current proposed replica usage.
    6. Use Sigmoid to calculate usage-based soft constraints score. This enhances the assignment result of the algorithm.
    7. Adjust the related test cases.
    
    * Minor fix for the constraints related tests. (#545)
    
    Minor fix for the constraints related tests.
    
    * Adjust the replica rebalance calculating ordering to avoid static order. (#535)
    
    * Adjust the replica rebalance calculating ordering to avoid static order.
    
    The problem of a static order is that the same set of replicas will always be the ones that are moved or state transited during the rebalance.
    This randomize won't change the algorithm's performance. But it will help the Helix to eliminate very unstable partitions.
    
    * Implement increment() method in CountMetric class. (#537)
    
    Abstract method increaseCount() in CountMetric is a generic method used in inherited classes. We should implement this method in CountMetric to reduce duplicate code in inherited classes.
    Change list:
    1. Move increaseCount() to CountMetric.
    2. Change the name to increment() and implement the method.
    
    * Modify the ivy file to add the new math3 lib dependency. (#546)
    
    Modify the ivy file to add the new math3 lib dependency.
    
    * Fix a missing parameter when the WAGED rebalancer init the change detector. (#547)
    
    This parameter was missed during the previous change.
    
    * Add the new Rebalancer monitor domain to the active domain list. (#550)
    
    Add the new Rebalancer monitor domain to the active domain list.
    
    * Refine ivy file config. The org were not configured correctly. (#551)
    
    * Use a deep copy of the new best possible assignment for measuring baseline divergence. (#542)
    
    The new assignment is really critical in waged rebalancer. If there is any potential changes in measure baseline divergence, waged rebalancer may not work correctly.
    To avoid changes of the new assignment and make it safe when being used to measure baseline divergence, use a deep copy of the new assignment.
    
    * Add max capacity usage metric for instance monitor. (#548)
    
    We need to monitor instance's max utilization in purpose of understanding what the max capacity usage is and knowing the status of the instance.
    
    Change list:
    1. Change instance monitor to extend dynamic metric, and change code logic in ClusterStatusMonitor to adapt the InstanceMonitor changes.
    2. Add APIs for get/update MaxCapacityUsage.
    3. Add an API in cluster status monitor to update max capacity usage.
    4. Add unit tests for instance monitor and updateing max capacity usage.
    
    * Fix formula incorrection in the comment for measuring baseline divergence. (#559)
    
    Fix incorrect formula in the comment for measuring baseline divergence.
    
    * Avoid redundant writes in AssignmentMetadataStore (#564)
    
    For the WAGED rebalancer, we persist the cluster's mapping via AssignmentMetadataStore every pipeline. However, if there are no changes made to the new assignment from the old assignment, this write is not necessary. This diff checks whether they are equal and skips the write if old and new assignments are the same.
    
    * Filter resource map with ideal states for instance capacity metrics. (#574)
    
    ResourceToReblance map also has resources from current states. And this causes null pointer exceptions at parsing all replicas stage when the resource is not in ideal states. This diff fixes the issue by only using the resources in ideal states to parse all replicas.
    
    * Introduce Dry-run Waged Rebalancer for the verifiers and tests. (#573)
    
    Use a dry-run rebalancer to avoid updating the persisted rebalancer status in the verifiers or tests.
    Also, refine several rebalancer related interfaces so as to simplify the dry-run rebalancer implementation.
    Convert the test cases back to use the BestPossibleExternalViewVerifier.
    
    Additional fixing:
    - Updating the rebalancer preference for every rebalancer.compute calls. Since the preference might be updated at runtime.
    - Fix one minor metric domain name bug in the WagedRebalancerMetricCollector.
    - Minor test case fix to make them more stable after the change.
    
    * Change ClusterConfig.setDefaultCapacityMap to be private. (#590)
    
    Change ClusterConfig.setDefaultCapacityMap to be private.
    
    * Add Java API for adding and validating resources for WAGED rebalancer (#570)
    
    Add Java API methods for adding and validating resources for WAGED rebalancer. This is a set of convenience APIs provided through HelixAdmin the user could use to more easily add resources and validate them for WAGED rebalance usage.
    Changelist:
    1. Add API methods in HelixAdmin
    2. Implement the said methods
    3. Add tests
    
    * Change calculation for baseline divergence. (#598)
    
    Change the calculation for baseline divergence: 0.0 means no difference, 1.0 means all are different.
    
    * Improve the WAGED rebalancer performance. (#586)
    
    This change improves the rebalance's speed by 2x to 5x depends on the host capacity.
    
    Parallelism the loop processing whenever possible and help to improve the performance. This does not change the logic.
    Avoid some duplicate logic in the loop. Put the calculation outside the loop and only do it once.
    
    * Fix the unstable test TestZeroReplicaAvoidance. (#603)
    
    Fix the unstable test TestZeroReplicaAvoidance by waiting.
    This is a temporary resolution before we fix issue #526. Marked it in the TODO comment so easier for us to remove the wait in batch later.
    
    * Add REST API endpoints for WAGED Rebalancer (#611)
    
    We want to make WAGED rebalancer (weight-aware) easier to use. One way to do this is to allow the user to easily add resources with weight configuration set by providing REST endpoints. This change adds the relevant REST endpoints based on the HelixAdmin APIs added in (#570).
    
    Basically, this commit uses existing REST endpoints whose hierarchy is defined by REST resource. What this commit does to the existing endpoints is 1) Add extra commands 2) Add a WAGED command as a QueryParam so that WAGED logic could be included.
    
    This change is backward-compatible because it keeps the original behavior when no commands are provided by using @DefaultValue annotation.
    
    * Fix a potential issue in the ResourceChangeSnapshot. (#635)
    
    The trim logic in the ResourceChangeSnapshot for cleaning up the IdealState should not clear the whole map. This will cause the WAGED rebalancer ignores changes such as new partitions into the partition list.
    Modify the test case accordingly.
    
    * Simply and enhance the RebalanceLatencyGauge so it can be used in multi-threads. (#636)
    
    The previous design of RebalanceLatencyGauge won't support asynchronous metric data emitting. This PR adds support by using a ThreadLocal object.
    The metric logic is not changed.
    
    * Add new WAGED rebalancer config item "GLOBAL_REBALANCE_ASYNC_MODE". (#637)
    
    This option will be used by the WAGED rebalancer to determine if the global rebalance should be performed asynchronously.
    
    * Decouple the event type and the scheduled rebalance cache refresh option. (#638)
    
    The previous design is that both on-demand and periodic rebalance scheduling task will request for a cache refresh. This won't be always true moving forward.
    For example, the WAGED rebalancer async baseline calculating requests for a scheduled rebalance. But cache refresh won't be necessary.
    This PR does not change any business logic. It prepares for future feature change.
    This PR ensures strict backward compatibility.
    
    * Improve the algorithm so it prioritizes the assignment to the idle nodes when the constraint evaluation results are the same (#651)
    
    This is to get rid of the randomness when the algorithm result is a tie. Usually, when the algorithm picks up the nodes with the same score, more partition movements will be triggered on a cluster change.
    
    * Refine the WAGED rebalancer to minimize the partial rebalance workload. (#639)
    
    * Refine the WAGED rebalancer to minimize the partial rebalance workload.
    
    Split the cluster module calculation method so that different rebalance logic can have different rebalance scope calculation logic.
    Also, refine the WAGED rebalancer logic to reduce duplicate code.
    
    * Refine methods name and comments. (#664)
    
    * Refine methods name and comments.
    
    * Asynchronously calculating the Baseline (#632)
    
    * Enable the Baseline calculation to be asynchronously done.
    
    This will greatly fasten the rebalance speed. Basically, the WAGED rebalancer will firstly partial rebalance to recover the invalid replica allocations (for example, the ones that are on a disabled instance). Then it calculates the new baseline by global rebalancing.
    
    * Reorgnize the test case so the new WAGED expand cluster tests are not skipped. (#670)
    
    TestNG cannot handle test classes inheritance well. Some of the tests are skipped with the current design. Move the logic to the new test class so it is no longer a child of another test class. This ensures all the test cases are running.
    
    * Fix the Helix rest tests by cleaning up the environment before testing. (#679)
    
    The validateWeight test methods in TestInstanceAccessor and TestPreInstanceAccessor are testing against the same instance config fields. There was a conflict if both of the test cases are executed in a certain order. This change adds cleanup logic so the shared fields will be empty before each test method starts.
    
    * Add instance capacity gauge (#557)
    
    We need to monitor instance utilization in purpose of understanding what the instance capacity is.
    
    Change list:
    - Change instance monitor to update capacity
    - Change getAttribute to throw AttributeNotFoundException in DynamicMBeanProvider
    - Combine max usage and instance capacity update into one method in cluster status monitor
    - Add unit test
    
    * Add resource partition weight gauge (#686)
    
    We would like to monitor the usage of each capacity for the resource partitions: gauge of the average partition weight for each CAPACITY key.
    
    Change list:
    - Add partition weight gauge metric to resource monitor.
    - Add two unit tests to cover new code.
    
    * Add WAGED rebalancer reset method to clean up cached status. (#696)
    
    The reset method is for cleaning up any in-memory records within the WAGED rebalancer so we don't need to recreate one.
    
    Detailed change list:
    1. Add reset methods to all the stateful objects that are used in the WAGED rebalancer.
    2. Refine some of the potential race condition in the WAGED rebalancer components.
    3. Adjust the tests accordingly. Also adding new tests to cover the components reset / the WAGED rebalancer reset logic.
    
    * Reset the WAGED rebalancer once the controller newly acquires leadership. (#690)
    
    This is to prevent any cached assignment information which is recorded during the previous session from impacting the rebalance result.
    Detailed change list:
    
    Move the stateful WAGED rebalancer to the GenericHelixController object instead of the rebalance stage. This is for resolving the possible race condition between the event processing thread and leader switch handling thread.
    Adding a new test regarding leadership switch to verify that the WAGED rebalancer has been reset after the processing.
    
    Co-authored-by: Hunter Lee <narendly@gmail.com>
    Co-authored-by: Yi Wang <ywang4@linkedin.com>
    Co-authored-by: Huizhi Lu <hulu@linkedin.com>
---
 helix-core/helix-core-0.9.2-SNAPSHOT.ivy           |   3 +-
 helix-core/pom.xml                                 |   7 +-
 .../java/org/apache/helix/BucketDataAccessor.java  |  53 ++
 .../src/main/java/org/apache/helix/HelixAdmin.java |  47 ++
 .../org/apache/helix/HelixRebalanceException.java  |  51 ++
 .../main/java/org/apache/helix/InstanceType.java   |   6 +-
 .../java/org/apache/helix/SystemPropertyKeys.java  |   2 +
 .../helix/controller/GenericHelixController.java   | 129 +++-
 .../controller/changedetector/ChangeDetector.java  |  57 ++
 .../changedetector/ResourceChangeDetector.java     | 199 ++++++
 .../changedetector/ResourceChangeSnapshot.java     | 157 ++++
 .../ResourceControllerDataProvider.java            |  34 +-
 .../rebalancer/DelayedAutoRebalancer.java          | 203 +-----
 .../controller/rebalancer/StatefulRebalancer.java  |  37 +
 .../rebalancer/util/DelayedRebalanceUtil.java      | 267 +++++++
 .../rebalancer/util/ResourceUsageCalculator.java   | 192 +++++
 .../rebalancer/util/WagedValidationUtil.java       |  91 +++
 .../rebalancer/waged/AssignmentMetadataStore.java  | 213 ++++++
 .../rebalancer/waged/RebalanceAlgorithm.java       |  43 ++
 .../rebalancer/waged/WagedRebalancer.java          | 787 +++++++++++++++++++++
 .../constraints/ConstraintBasedAlgorithm.java      | 228 ++++++
 .../ConstraintBasedAlgorithmFactory.java           |  82 +++
 .../constraints/FaultZoneAwareConstraint.java      |  43 ++
 .../waged/constraints/HardConstraint.java          |  47 ++
 .../InstancePartitionsCountConstraint.java         |  41 ++
 .../MaxCapacityUsageInstanceConstraint.java        |  42 ++
 .../waged/constraints/NodeCapacityConstraint.java  |  50 ++
 .../NodeMaxPartitionLimitConstraint.java           |  43 ++
 .../constraints/PartitionMovementConstraint.java   |  96 +++
 .../constraints/ReplicaActivateConstraint.java}    |  44 +-
 .../ResourcePartitionAntiAffinityConstraint.java   |  43 ++
 .../ResourceTopStateAntiAffinityConstraint.java    |  44 ++
 .../SamePartitionOnInstanceConstraint.java}        |  29 +-
 .../waged/constraints/SoftConstraint.java          |  90 +++
 .../waged/constraints/UsageSoftConstraint.java     |  85 +++
 .../constraints/ValidGroupTagConstraint.java}      |  31 +-
 .../rebalancer/waged/model/AssignableNode.java     | 374 ++++++++++
 .../rebalancer/waged/model/AssignableReplica.java  | 161 +++++
 .../rebalancer/waged/model/ClusterContext.java     | 172 +++++
 .../rebalancer/waged/model/ClusterModel.java       | 132 ++++
 .../waged/model/ClusterModelProvider.java          | 532 ++++++++++++++
 .../rebalancer/waged/model/OptimalAssignment.java  |  93 +++
 .../helix/controller/stages/AttributeName.java     |   3 +-
 .../stages/BestPossibleStateCalcStage.java         | 180 ++++-
 .../stages/CurrentStateComputationStage.java       |  75 ++
 .../controller/stages/CurrentStateOutput.java      |  23 +
 .../org/apache/helix/manager/zk/ZKHelixAdmin.java  | 282 +++++++-
 .../manager/zk/ZNRecordJacksonSerializer.java      |  67 ++
 .../helix/manager/zk/ZkBucketDataAccessor.java     | 380 ++++++++++
 .../java/org/apache/helix/model/ClusterConfig.java | 228 +++++-
 .../org/apache/helix/model/InstanceConfig.java     |  52 +-
 .../org/apache/helix/model/ResourceConfig.java     | 129 +++-
 .../apache/helix/model/StateModelDefinition.java   |   4 +-
 .../monitoring/mbeans/ClusterStatusMonitor.java    |  92 ++-
 .../helix/monitoring/mbeans/InstanceMonitor.java   | 192 ++++-
 .../monitoring/mbeans/MonitorDomainNames.java      |   3 +-
 .../helix/monitoring/mbeans/ResourceMonitor.java   |  99 ++-
 .../mbeans/dynamicMBeans/DynamicMBeanProvider.java |  78 +-
 .../mbeans/dynamicMBeans/SimpleDynamicMetric.java  |   2 +-
 .../helix/monitoring/metrics/MetricCollector.java  |  99 +++
 .../metrics/WagedRebalancerMetricCollector.java    | 125 ++++
 .../implementation/BaselineDivergenceGauge.java    |  68 ++
 .../implementation/RebalanceCounter.java}          |  24 +-
 .../implementation/RebalanceFailureCount.java}     |  24 +-
 .../implementation/RebalanceLatencyGauge.java      |  89 +++
 .../monitoring/metrics/model/CountMetric.java      |  69 ++
 .../monitoring/metrics/model/LatencyMetric.java    |  67 ++
 .../model/Metric.java}                             |  31 +-
 .../model/RatioMetric.java}                        |  46 +-
 .../BestPossibleExternalViewVerifier.java          |  73 +-
 .../StrictMatchExternalViewVerifier.java           |  55 +-
 .../ClusterVerifiers/ZkHelixClusterVerifier.java   |  17 -
 .../main/java/org/apache/helix/util/HelixUtil.java |  25 +-
 .../java/org/apache/helix/util/RebalanceUtil.java  |   7 +-
 .../resources/soft-constraint-weight.properties    |  26 +
 .../java/org/apache/helix/common/ZkTestBase.java   |  16 +-
 .../changedetector/TestResourceChangeDetector.java | 441 ++++++++++++
 .../util/TestResourceUsageCalculator.java          | 103 +++
 .../waged/MockAssignmentMetadataStore.java         |  60 ++
 .../waged/TestAssignmentMetadataStore.java         | 186 +++++
 .../rebalancer/waged/TestWagedRebalancer.java      | 524 ++++++++++++++
 .../waged/TestWagedRebalancerMetrics.java          | 190 +++++
 .../waged/constraints/MockRebalanceAlgorithm.java  |  84 +++
 .../constraints/TestConstraintBasedAlgorithm.java  |  72 ++
 .../constraints/TestFaultZoneAwareConstraint.java  |  79 +++
 .../TestInstancePartitionsCountConstraint.java     |  63 ++
 .../TestMaxCapacityUsageInstanceConstraint.java    |  57 ++
 .../constraints/TestNodeCapacityConstraint.java    |  54 ++
 .../TestNodeMaxPartitionLimitConstraint.java       |  56 ++
 .../TestPartitionActivateConstraint.java           |  64 ++
 .../TestPartitionMovementConstraint.java           | 127 ++++
 ...estResourcePartitionAntiAffinityConstraint.java |  67 ++
 ...TestResourceTopStateAntiAffinityConstraint.java |  82 +++
 .../TestSamePartitionOnInstanceConstraint.java     |  59 ++
 .../TestSoftConstraintNormalizeFunction.java       |  47 ++
 .../constraints/TestValidGroupTagConstraint.java   |  66 ++
 .../waged/model/AbstractTestClusterModel.java      | 204 ++++++
 .../waged/model/ClusterModelTestHelper.java        |  40 ++
 .../rebalancer/waged/model/TestAssignableNode.java | 280 ++++++++
 .../waged/model/TestAssignableReplica.java         | 167 +++++
 .../rebalancer/waged/model/TestClusterContext.java |  93 +++
 .../rebalancer/waged/model/TestClusterModel.java   | 101 +++
 .../waged/model/TestClusterModelProvider.java      | 376 ++++++++++
 .../waged/model/TestOptimalAssignment.java         |  91 +++
 .../TestCrushAutoRebalanceNonRack.java             |   8 +-
 .../rebalancer/CrushRebalancers/TestNodeSwap.java  |   4 +-
 .../TestDelayedAutoRebalance.java                  |  49 +-
 ...stDelayedAutoRebalanceWithDisabledInstance.java |  32 +-
 .../TestDelayedAutoRebalanceWithRackaware.java     |   2 +-
 .../PartitionMigration/TestExpandCluster.java      |   4 +-
 .../TestPartitionMigrationBase.java                |  26 +-
 .../TestWagedRebalancerMigration.java              | 111 +++
 .../rebalancer/TestMixedModeAutoRebalance.java     | 181 +++--
 .../rebalancer/TestZeroReplicaAvoidance.java       |  62 +-
 .../WagedRebalancer/TestDelayedWagedRebalance.java |  89 +++
 ...tDelayedWagedRebalanceWithDisabledInstance.java |  96 +++
 .../TestDelayedWagedRebalanceWithRackaware.java    |  96 +++
 .../TestMixedModeWagedRebalance.java               |  58 ++
 .../TestWagedExpandCluster.java}                   |  47 +-
 .../WagedRebalancer/TestWagedNodeSwap.java         | 294 ++++++++
 .../WagedRebalancer/TestWagedRebalance.java        | 590 +++++++++++++++
 .../TestWagedRebalanceFaultZone.java               | 372 ++++++++++
 .../TestWagedRebalanceTopologyAware.java           | 114 +++
 .../helix/manager/zk/TestZkBucketDataAccessor.java | 189 +++++
 .../apache/helix/manager/zk/TestZkHelixAdmin.java  | 108 +++
 .../java/org/apache/helix/mock/MockHelixAdmin.java |  22 +
 .../org/apache/helix/model/TestClusterConfig.java  | 259 +++++++
 .../org/apache/helix/model/TestInstanceConfig.java |  71 +-
 .../org/apache/helix/model/TestResourceConfig.java | 186 +++++
 .../mbeans/TestClusterStatusMonitor.java           | 192 ++++-
 .../monitoring/mbeans/TestInstanceMonitor.java     |  75 ++
 .../monitoring/mbeans/TestResourceMonitor.java     | 371 ++++++----
 .../mbeans/TestRoutingTableProviderMonitor.java    |  10 +-
 .../monitoring/mbeans/TestZkClientMonitor.java     |   9 +-
 .../apache/helix/tools/TestClusterVerifier.java    |  76 +-
 ...eUsageCalculator.MeasureBaselineDivergence.json |  37 +
 ...chExternalViewVerifier.ComputeIdealMapping.json |  14 +-
 .../rest/server/resources/AbstractResource.java    |  10 +-
 .../server/resources/helix/ClusterAccessor.java    |  10 +-
 .../server/resources/helix/InstancesAccessor.java  |  88 ++-
 .../resources/helix/PerInstanceAccessor.java       |  49 +-
 .../server/resources/helix/ResourceAccessor.java   | 152 +++-
 .../helix/rest/server/AbstractTestClass.java       |   5 +-
 .../helix/rest/server/TestClusterAccessor.java     |  13 +
 .../helix/rest/server/TestInstancesAccessor.java   |  77 ++
 .../helix/rest/server/TestPerInstanceAccessor.java |  66 ++
 .../helix/rest/server/TestResourceAccessor.java    | 117 ++-
 .../rest/server/util/JerseyUriRequestBuilder.java  |   3 +-
 148 files changed, 15108 insertions(+), 999 deletions(-)

diff --git a/helix-core/helix-core-0.9.2-SNAPSHOT.ivy b/helix-core/helix-core-0.9.2-SNAPSHOT.ivy
index 2d6e298..07dd266 100644
--- a/helix-core/helix-core-0.9.2-SNAPSHOT.ivy
+++ b/helix-core/helix-core-0.9.2-SNAPSHOT.ivy
@@ -57,7 +57,8 @@ under the License.
     <dependency org="org.codehaus.jackson" name="jackson-mapper-asl" rev="1.8.5" conf="compile->compile(default);runtime->runtime(default);default->default"/>
     <dependency org="commons-io" name="commons-io" rev="1.4" conf="compile->compile(default);runtime->runtime(default);default->default"/>
     <dependency org="commons-cli" name="commons-cli" rev="1.2" conf="compile->compile(default);runtime->runtime(default);default->default"/>
-    <dependency org="commons-math" name="commons-math" rev="2.1" conf="compile->compile(default);runtime->runtime(default);default->default"/>
+    <dependency org="org.apache.commons" name="commons-math" rev="2.1" conf="compile->compile(default);runtime->runtime(default);default->default"/>
+    <dependency org="org.apache.commons" name="commons-math3" rev="3.6.1" conf="compile->compile(default);runtime->runtime(default);default->default"/>
     <dependency org="com.101tec" name="zkclient" rev="0.5" conf="compile->compile(default);runtime->runtime(default);default->default"/>
     <dependency org="com.google.guava" name="guava" rev="15.0" conf="compile->compile(default);runtime->runtime(default);default->default"/>
     <dependency org="org.yaml" name="snakeyaml" rev="1.12" conf="compile->compile(default);runtime->runtime(default);default->default"/>
diff --git a/helix-core/pom.xml b/helix-core/pom.xml
index 45b6552..1077cc0 100644
--- a/helix-core/pom.xml
+++ b/helix-core/pom.xml
@@ -37,7 +37,7 @@ under the License.
       org.I0Itec.zkclient*,
       org.apache.commons.cli*;version="[1.2,2)",
       org.apache.commons.io*;version="[1.4,2)",
-      org.apache.commons.math*;version="[2.1,3)",
+      org.apache.commons.math*;version="[2.1,4)",
       org.apache.jute*;resolution:=optional,
       org.apache.zookeeper.server.persistence*;resolution:=optional,
       org.apache.zookeeper.server.util*;resolution:=optional,
@@ -140,6 +140,11 @@ under the License.
       <version>2.1</version>
     </dependency>
     <dependency>
+      <groupId>org.apache.commons</groupId>
+      <artifactId>commons-math3</artifactId>
+      <version>3.6.1</version>
+    </dependency>
+    <dependency>
       <groupId>commons-codec</groupId>
       <artifactId>commons-codec</artifactId>
       <version>1.6</version>
diff --git a/helix-core/src/main/java/org/apache/helix/BucketDataAccessor.java b/helix-core/src/main/java/org/apache/helix/BucketDataAccessor.java
new file mode 100644
index 0000000..2008c23
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/BucketDataAccessor.java
@@ -0,0 +1,53 @@
+package org.apache.helix;
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+
+public interface BucketDataAccessor {
+
+  /**
+   * Write a HelixProperty in buckets, compressed.
+   * @param path path to which the metadata will be written to
+   * @param value HelixProperty to write
+   * @param <T>
+   * @throws IOException
+   */
+  <T extends HelixProperty> boolean compressedBucketWrite(String path, T value) throws IOException;
+
+  /**
+   * Read a HelixProperty that was written in buckets, compressed.
+   * @param path
+   * @param helixPropertySubType the subtype of HelixProperty the data was written in
+   * @param <T>
+   */
+  <T extends HelixProperty> HelixProperty compressedBucketRead(String path,
+      Class<T> helixPropertySubType);
+
+  /**
+   * Delete the HelixProperty in the given path.
+   * @param path
+   */
+  void compressedBucketDelete(String path);
+
+  /**
+   * Close the connection to the metadata store.
+   */
+  void disconnect();
+}
diff --git a/helix-core/src/main/java/org/apache/helix/HelixAdmin.java b/helix-core/src/main/java/org/apache/helix/HelixAdmin.java
index a11b235..423f879 100644
--- a/helix-core/src/main/java/org/apache/helix/HelixAdmin.java
+++ b/helix-core/src/main/java/org/apache/helix/HelixAdmin.java
@@ -31,6 +31,7 @@ import org.apache.helix.model.HelixConfigScope;
 import org.apache.helix.model.IdealState;
 import org.apache.helix.model.InstanceConfig;
 import org.apache.helix.model.MaintenanceSignal;
+import org.apache.helix.model.ResourceConfig;
 import org.apache.helix.model.StateModelDefinition;
 
 /*
@@ -579,4 +580,50 @@ public interface HelixAdmin {
   default void close() {
     System.out.println("Default close() was invoked! No operation was executed.");
   }
+
+  /**
+   * Adds a resource with IdealState and ResourceConfig to be rebalanced by WAGED rebalancer with validation.
+   * Validation includes the following:
+   * 1. Check ResourceConfig has the WEIGHT field
+   * 2. Check that all capacity keys from ClusterConfig are set up in the WEIGHT field
+   * 3. Check that all ResourceConfig's weightMap fields have all of the capacity keys
+   * @param clusterName
+   * @param idealState
+   * @param resourceConfig
+   * @return true if the resource has been added successfully. False otherwise
+   */
+  boolean addResourceWithWeight(String clusterName, IdealState idealState,
+      ResourceConfig resourceConfig);
+
+  /**
+   * Batch-enables Waged rebalance for the names of resources given.
+   * @param clusterName
+   * @param resourceNames
+   * @return
+   */
+  boolean enableWagedRebalance(String clusterName, List<String> resourceNames);
+
+  /**
+   * Validates the resources to see if their weight configs have been set properly.
+   * Validation includes the following:
+   * 1. Check ResourceConfig has the WEIGHT field
+   * 2. Check that all capacity keys from ClusterConfig are set up in the WEIGHT field
+   * 3. Check that all ResourceConfig's weightMap fields have all of the capacity keys
+   * @param resourceNames
+   * @return for each resource, true if the weight configs have been set properly, false otherwise
+   */
+  Map<String, Boolean> validateResourcesForWagedRebalance(String clusterName,
+      List<String> resourceNames);
+
+  /**
+   * Validates the instances to ensure their weights in InstanceConfigs have been set up properly.
+   * Validation includes the following:
+   * 1. If default instance capacity is not set, check that the InstanceConfigs have the CAPACITY field
+   * 2. Check that all capacity keys defined in ClusterConfig are present in the CAPACITY field
+   * @param clusterName
+   * @param instancesNames
+   * @return
+   */
+  Map<String, Boolean> validateInstancesForWagedRebalance(String clusterName,
+      List<String> instancesNames);
 }
diff --git a/helix-core/src/main/java/org/apache/helix/HelixRebalanceException.java b/helix-core/src/main/java/org/apache/helix/HelixRebalanceException.java
new file mode 100644
index 0000000..d54853f
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/HelixRebalanceException.java
@@ -0,0 +1,51 @@
+package org.apache.helix;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/**
+ * Exception thrown by Helix due to rebalance failures.
+ */
+public class HelixRebalanceException extends Exception {
+  // TODO: Adding static description or other necessary fields into the enum instances for
+  // TODO: supporting the rebalance monitor to understand the exception.
+  public enum Type {
+    INVALID_CLUSTER_STATUS,
+    INVALID_REBALANCER_STATUS,
+    FAILED_TO_CALCULATE,
+    INVALID_INPUT,
+    UNKNOWN_FAILURE
+  }
+
+  private final Type _type;
+
+  public HelixRebalanceException(String message, Type type, Throwable cause) {
+    super(String.format("%s Failure Type: %s", message, type.name()), cause);
+    _type = type;
+  }
+
+  public HelixRebalanceException(String message, Type type) {
+    super(String.format("%s Failure Type: %s", message, type.name()));
+    _type = type;
+  }
+
+  public Type getFailureType() {
+    return _type;
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/InstanceType.java b/helix-core/src/main/java/org/apache/helix/InstanceType.java
index 84e9d87..92b0e80 100644
--- a/helix-core/src/main/java/org/apache/helix/InstanceType.java
+++ b/helix-core/src/main/java/org/apache/helix/InstanceType.java
@@ -36,7 +36,8 @@ public enum InstanceType {
   CONTROLLER(new String[] {
       MonitorDomainNames.ClusterStatus.name(),
       MonitorDomainNames.HelixZkClient.name(),
-      MonitorDomainNames.HelixCallback.name()
+      MonitorDomainNames.HelixCallback.name(),
+      MonitorDomainNames.Rebalancer.name()
   }),
 
   PARTICIPANT(new String[] {
@@ -51,7 +52,8 @@ public enum InstanceType {
       MonitorDomainNames.HelixZkClient.name(),
       MonitorDomainNames.HelixCallback.name(),
       MonitorDomainNames.HelixThreadPoolExecutor.name(),
-      MonitorDomainNames.CLMParticipantReport.name()
+      MonitorDomainNames.CLMParticipantReport.name(),
+      MonitorDomainNames.Rebalancer.name()
   }),
 
   SPECTATOR(new String[] {
diff --git a/helix-core/src/main/java/org/apache/helix/SystemPropertyKeys.java b/helix-core/src/main/java/org/apache/helix/SystemPropertyKeys.java
index 1a6a797..d316986 100644
--- a/helix-core/src/main/java/org/apache/helix/SystemPropertyKeys.java
+++ b/helix-core/src/main/java/org/apache/helix/SystemPropertyKeys.java
@@ -6,6 +6,8 @@ public class SystemPropertyKeys {
 
   // ZKHelixManager
   public static final String CLUSTER_MANAGER_VERSION = "cluster-manager-version.properties";
+  // soft constraints weight definitions
+  public static final String SOFT_CONSTRAINT_WEIGHTS = "soft-constraint-weight.properties";
 
   public static final String FLAPPING_TIME_WINDOW = "helixmanager.flappingTimeWindow";
 
diff --git a/helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java b/helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
index 39a5ad7..e47c420 100644
--- a/helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
+++ b/helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
@@ -25,6 +25,7 @@ import java.util.Collections;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
+import java.util.Optional;
 import java.util.Set;
 import java.util.Timer;
 import java.util.TimerTask;
@@ -61,6 +62,8 @@ import org.apache.helix.controller.dataproviders.WorkflowControllerDataProvider;
 import org.apache.helix.controller.pipeline.AsyncWorkerType;
 import org.apache.helix.controller.pipeline.Pipeline;
 import org.apache.helix.controller.pipeline.PipelineRegistry;
+import org.apache.helix.controller.rebalancer.StatefulRebalancer;
+import org.apache.helix.controller.rebalancer.waged.WagedRebalancer;
 import org.apache.helix.controller.stages.AttributeName;
 import org.apache.helix.controller.stages.BestPossibleStateCalcStage;
 import org.apache.helix.controller.stages.ClusterEvent;
@@ -168,7 +171,6 @@ public class GenericHelixController implements IdealStateChangeListener,
   Timer _onDemandRebalanceTimer = null;
   AtomicReference<RebalanceTask> _nextRebalanceTask = new AtomicReference<>();
 
-
   /**
    * A cache maintained across pipelines
    */
@@ -186,6 +188,17 @@ public class GenericHelixController implements IdealStateChangeListener,
 
   private HelixManager _helixManager;
 
+  // Since the stateful rebalancer needs to be lazily constructed when the HelixManager instance is
+  // ready, the GenericHelixController is not constructed with a stateful rebalancer. This wrapper
+  // is to avoid the complexity of handling a nullable value in the event handling process.
+  // TODO Create the required stateful rebalancer only when it is used by any resource.
+  private final StatefulRebalancerRef _rebalancerRef = new StatefulRebalancerRef() {
+    @Override
+    protected StatefulRebalancer createRebalancer(HelixManager helixManager) {
+      return new WagedRebalancer(helixManager);
+    }
+  };
+
   /**
    * TODO: We should get rid of this once we move to:
    *  1) ZK callback should go to ClusterDataCache and trigger data cache refresh only
@@ -221,17 +234,29 @@ public class GenericHelixController implements IdealStateChangeListener,
   class RebalanceTask extends TimerTask {
     final HelixManager _manager;
     final ClusterEventType _clusterEventType;
+    private final Optional<Boolean> _shouldRefreshCacheOption;
     private long _nextRebalanceTime;
 
     public RebalanceTask(HelixManager manager, ClusterEventType clusterEventType) {
       this(manager, clusterEventType, -1);
+    }
+
+    public RebalanceTask(HelixManager manager, ClusterEventType clusterEventType,
+        long nextRebalanceTime) {
+      this(manager, clusterEventType, nextRebalanceTime, Optional.empty());
+    }
 
+    public RebalanceTask(HelixManager manager, ClusterEventType clusterEventType,
+        long nextRebalanceTime, boolean shouldRefreshCache) {
+      this(manager, clusterEventType, nextRebalanceTime, Optional.of(shouldRefreshCache));
     }
 
-    public RebalanceTask(HelixManager manager, ClusterEventType clusterEventType, long nextRebalanceTime) {
+    private RebalanceTask(HelixManager manager, ClusterEventType clusterEventType,
+        long nextRebalanceTime, Optional<Boolean> shouldRefreshCacheOption) {
       _manager = manager;
       _clusterEventType = clusterEventType;
       _nextRebalanceTime = nextRebalanceTime;
+      _shouldRefreshCacheOption = shouldRefreshCacheOption;
     }
 
     public long getNextRebalanceTime() {
@@ -241,8 +266,9 @@ public class GenericHelixController implements IdealStateChangeListener,
     @Override
     public void run() {
       try {
-        if (_clusterEventType.equals(ClusterEventType.PeriodicalRebalance) || _clusterEventType
-            .equals(ClusterEventType.OnDemandRebalance)) {
+        if (_shouldRefreshCacheOption.orElse(
+            _clusterEventType.equals(ClusterEventType.PeriodicalRebalance) || _clusterEventType
+                .equals(ClusterEventType.OnDemandRebalance))) {
           requestDataProvidersFullRefresh();
 
           HelixDataAccessor accessor = _manager.getHelixDataAccessor();
@@ -360,7 +386,17 @@ public class GenericHelixController implements IdealStateChangeListener,
    * Schedule an on demand rebalance pipeline.
    * @param delay
    */
+  @Deprecated
   public void scheduleOnDemandRebalance(long delay) {
+    scheduleOnDemandRebalance(delay, true);
+  }
+
+  /**
+   * Schedule an on demand rebalance pipeline.
+   * @param delay
+   * @param shouldRefreshCache true if refresh the cache before scheduling a rebalance.
+   */
+  public void scheduleOnDemandRebalance(long delay, boolean shouldRefreshCache) {
     if (_helixManager == null) {
       logger.error("Failed to schedule a future pipeline run for cluster {}. Helix manager is null!",
           _clusterName);
@@ -378,7 +414,8 @@ public class GenericHelixController implements IdealStateChangeListener,
     }
 
     RebalanceTask newTask =
-        new RebalanceTask(_helixManager, ClusterEventType.OnDemandRebalance, rebalanceTime);
+        new RebalanceTask(_helixManager, ClusterEventType.OnDemandRebalance, rebalanceTime,
+            shouldRefreshCache);
 
     _onDemandRebalanceTimer.schedule(newTask, delay);
     logger.info("Scheduled instant pipeline run for cluster {}." , _helixManager.getClusterName());
@@ -601,6 +638,22 @@ public class GenericHelixController implements IdealStateChangeListener,
       return;
     }
 
+    // Event handling happens in a different thread from the onControllerChange processing thread.
+    // Thus, there are several possible conditions.
+    // 1. Event handled after leadership acquired. So we will have a valid rebalancer for the
+    // event processing.
+    // 2. Event handled shortly after leadership relinquished. And the rebalancer has not been
+    // marked as invalid yet. So the event will be processed the same as case one.
+    // 3. Event is leftover from the previous session, and it is handled when the controller
+    // regains the leadership. The rebalancer will be reset before being used. That is the
+    // expected behavior so as to avoid inconsistent rebalance result.
+    // 4. Event handled shortly after leadership relinquished. And the rebalancer has been marked
+    // as invalid. So we reset the rebalancer. But the later isLeader() check will return false and
+    // the pipeline will be triggered. So the reset rebalancer won't be used before the controller
+    // regains leadership.
+    event.addAttribute(AttributeName.STATEFUL_REBALANCER.name(),
+        _rebalancerRef.getRebalancer(manager));
+
     if (!manager.isLeader()) {
       logger.error("Cluster manager: " + manager.getInstanceName() + " is not leader for " + manager
           .getClusterName() + ". Pipeline will not be invoked");
@@ -997,6 +1050,12 @@ public class GenericHelixController implements IdealStateChangeListener,
       _clusterStatusMonitor.setMaintenance(_inMaintenanceMode);
     } else {
       enableClusterStatusMonitor(false);
+      // Note that onControllerChange is executed in parallel with the event processing thread. It
+      // is possible that the current WAGED rebalancer object is in use for handling callback. So
+      // mark the rebalancer invalid only, instead of closing it here.
+      // This to-be-closed WAGED rebalancer will be reset later on a later event processing if
+      // the controller becomes leader again.
+      _rebalancerRef.invalidateRebalancer();
     }
 
     logger.info("END: GenericClusterController.onControllerChange() for cluster " + _clusterName);
@@ -1100,6 +1159,8 @@ public class GenericHelixController implements IdealStateChangeListener,
 
     enableClusterStatusMonitor(false);
 
+    _rebalancerRef.closeRebalancer();
+
     // TODO controller shouldn't be used in anyway after shutdown.
     // Need to record shutdown and throw Exception if the controller is used again.
   }
@@ -1177,7 +1238,6 @@ public class GenericHelixController implements IdealStateChangeListener,
     return statusFlag;
   }
 
-
   // TODO: refactor this to use common/ClusterEventProcessor.
   @Deprecated
   private class ClusterEventProcessor extends Thread {
@@ -1233,4 +1293,59 @@ public class GenericHelixController implements IdealStateChangeListener,
     eventThread.setDaemon(true);
     eventThread.start();
   }
-}
\ No newline at end of file
+
+  /**
+   * A wrapper class for the stateful rebalancer instance that will be tracked in the
+   * GenericHelixController.
+   */
+  private abstract class StatefulRebalancerRef<T extends StatefulRebalancer> {
+    private T _rebalancer = null;
+    private boolean _isRebalancerValid = true;
+
+    /**
+     * @param helixManager
+     * @return A new stateful rebalancer instance with initial state.
+     */
+    protected abstract T createRebalancer(HelixManager helixManager);
+
+    /**
+     * Mark the current rebalancer object to be invalid, which indicates it needs to be reset before
+     * the next usage.
+     */
+    synchronized void invalidateRebalancer() {
+      _isRebalancerValid = false;
+    }
+
+    /**
+     * @return A valid rebalancer object.
+     *         If the rebalancer is no longer valid, it will be reset before returning.
+     * TODO: Make rebalancer volatile or make it singleton, if this method is called in multiple
+     * TODO: threads outside the controller object.
+     */
+    synchronized T getRebalancer(HelixManager helixManager) {
+      // Lazily initialize the stateful rebalancer instance since the GenericHelixController
+      // instance is instantiated without the HelixManager information that is required.
+      if (_rebalancer == null) {
+        _rebalancer = createRebalancer(helixManager);
+        _isRebalancerValid = true;
+      }
+      // If the rebalance exists but has been marked as invalid (due to leadership switch), it needs
+      // to be reset before return.
+      if (!_isRebalancerValid) {
+        _rebalancer.reset();
+        _isRebalancerValid = true;
+      }
+      return _rebalancer;
+    }
+
+    /**
+     * Proactively close the rebalance object to release the resources.
+     */
+    synchronized void closeRebalancer() {
+      if (_rebalancer != null) {
+        _rebalancer.close();
+        _rebalancer = null;
+      }
+    }
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/changedetector/ChangeDetector.java b/helix-core/src/main/java/org/apache/helix/controller/changedetector/ChangeDetector.java
new file mode 100644
index 0000000..fbe4afc
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/changedetector/ChangeDetector.java
@@ -0,0 +1,57 @@
+package org.apache.helix.controller.changedetector;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.Collection;
+import org.apache.helix.HelixConstants;
+
+/**
+ * ChangeDetector interface that will be used to track deltas in the cluster from one pipeline run
+ * to another. The interface methods are designed to be flexible for both the resource pipeline and
+ * the task pipeline.
+ * TODO: Consider splitting this up into two different ChangeDetector interfaces:
+ * TODO: PropertyBasedChangeDetector and PathBasedChangeDetector.
+ */
+public interface ChangeDetector {
+
+  /**
+   * Returns all types of changes detected.
+   * @return a collection of ChangeTypes
+   */
+  Collection<HelixConstants.ChangeType> getChangeTypes();
+
+  /**
+   * Returns the names of items that changed based on the change type given.
+   * @return a collection of names of items that changed
+   */
+  Collection<String> getChangesByType(HelixConstants.ChangeType changeType);
+
+  /**
+   * Returns the names of items that were added based on the change type given.
+   * @return a collection of names of items that were added
+   */
+  Collection<String> getAdditionsByType(HelixConstants.ChangeType changeType);
+
+  /**
+   * Returns the names of items that were removed based on the change type given.
+   * @return a collection of names of items that were removed
+   */
+  Collection<String> getRemovalsByType(HelixConstants.ChangeType changeType);
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/changedetector/ResourceChangeDetector.java b/helix-core/src/main/java/org/apache/helix/controller/changedetector/ResourceChangeDetector.java
new file mode 100644
index 0000000..27f4c50
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/changedetector/ResourceChangeDetector.java
@@ -0,0 +1,199 @@
+package org.apache.helix.controller.changedetector;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import com.google.common.collect.Sets;
+import org.apache.helix.HelixConstants;
+import org.apache.helix.HelixProperty;
+import org.apache.helix.controller.dataproviders.ResourceControllerDataProvider;
+import org.apache.helix.model.ClusterConfig;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * ResourceChangeDetector implements ChangeDetector. It caches resource-related metadata from
+ * Helix's main resource pipeline cache (DataProvider) and the computation results of change
+ * detection.
+ * WARNING: the methods of this class are not thread-safe.
+ */
+public class ResourceChangeDetector implements ChangeDetector {
+  private static final Logger LOG = LoggerFactory.getLogger(ResourceChangeDetector.class.getName());
+
+  private final boolean _ignoreControllerGeneratedFields;
+  private ResourceChangeSnapshot _oldSnapshot; // snapshot for previous pipeline run
+  private ResourceChangeSnapshot _newSnapshot; // snapshot for this pipeline run
+
+  // The following caches the computation results
+  private Map<HelixConstants.ChangeType, Collection<String>> _changedItems = new HashMap<>();
+  private Map<HelixConstants.ChangeType, Collection<String>> _addedItems = new HashMap<>();
+  private Map<HelixConstants.ChangeType, Collection<String>> _removedItems = new HashMap<>();
+
+  public ResourceChangeDetector(boolean ignoreControllerGeneratedFields) {
+    _newSnapshot = new ResourceChangeSnapshot();
+    _ignoreControllerGeneratedFields = ignoreControllerGeneratedFields;
+  }
+
+  public ResourceChangeDetector() {
+    this(false);
+  }
+
+  /**
+   * Compare the underlying HelixProperty objects and produce a collection of names of changed
+   * properties.
+   * @return
+   */
+  private Collection<String> getChangedItems(Map<String, ? extends HelixProperty> oldPropertyMap,
+      Map<String, ? extends HelixProperty> newPropertyMap) {
+    Collection<String> changedItems = new HashSet<>();
+    oldPropertyMap.forEach((name, property) -> {
+      if (newPropertyMap.containsKey(name)
+          && !property.getRecord().equals(newPropertyMap.get(name).getRecord())) {
+        changedItems.add(name);
+      }
+    });
+    return changedItems;
+  }
+
+  /**
+   * Return a collection of names that are newly added.
+   * @return
+   */
+  private Collection<String> getAddedItems(Map<String, ? extends HelixProperty> oldPropertyMap,
+      Map<String, ? extends HelixProperty> newPropertyMap) {
+    return Sets.difference(newPropertyMap.keySet(), oldPropertyMap.keySet());
+  }
+
+  /**
+   * Return a collection of names that were removed.
+   * @return
+   */
+  private Collection<String> getRemovedItems(Map<String, ? extends HelixProperty> oldPropertyMap,
+      Map<String, ? extends HelixProperty> newPropertyMap) {
+    return Sets.difference(oldPropertyMap.keySet(), newPropertyMap.keySet());
+  }
+
+  private void clearCachedComputation() {
+    _changedItems.clear();
+    _addedItems.clear();
+    _removedItems.clear();
+  }
+
+  /**
+   * Based on the change type given and propertyMap type, call the right getters for propertyMap.
+   * @param changeType
+   * @param snapshot
+   * @return
+   */
+  private Map<String, ? extends HelixProperty> determinePropertyMapByType(
+      HelixConstants.ChangeType changeType, ResourceChangeSnapshot snapshot) {
+    switch (changeType) {
+    case INSTANCE_CONFIG:
+      return snapshot.getInstanceConfigMap();
+    case IDEAL_STATE:
+      return snapshot.getIdealStateMap();
+    case RESOURCE_CONFIG:
+      return snapshot.getResourceConfigMap();
+    case LIVE_INSTANCE:
+      return snapshot.getLiveInstances();
+    case CLUSTER_CONFIG:
+      ClusterConfig config = snapshot.getClusterConfig();
+      if (config == null) {
+        return Collections.emptyMap();
+      } else {
+        return Collections.singletonMap(config.getClusterName(), config);
+      }
+    default:
+      LOG.warn(
+          "ResourceChangeDetector cannot determine propertyMap for the given ChangeType: {}. Returning an empty map.",
+          changeType);
+      return Collections.emptyMap();
+    }
+  }
+
+  /**
+   * Makes the current newSnapshot the oldSnapshot and reads in the up-to-date snapshot for change
+   * computation. To be called in the controller pipeline.
+   * @param dataProvider newly refreshed DataProvider (cache)
+   */
+  public synchronized void updateSnapshots(ResourceControllerDataProvider dataProvider) {
+    // If there are changes, update internal states
+    _oldSnapshot = new ResourceChangeSnapshot(_newSnapshot);
+    _newSnapshot = new ResourceChangeSnapshot(dataProvider, _ignoreControllerGeneratedFields);
+    dataProvider.clearRefreshedChangeTypes();
+
+    // Invalidate cached computation
+    clearCachedComputation();
+  }
+
+  public synchronized void resetSnapshots() {
+    _newSnapshot = new ResourceChangeSnapshot();
+    clearCachedComputation();
+  }
+
+  @Override
+  public synchronized Collection<HelixConstants.ChangeType> getChangeTypes() {
+    return Collections.unmodifiableSet(_newSnapshot.getChangedTypes());
+  }
+
+  @Override
+  public synchronized Collection<String> getChangesByType(HelixConstants.ChangeType changeType) {
+    return _changedItems.computeIfAbsent(changeType,
+        changedItems -> getChangedItems(determinePropertyMapByType(changeType, _oldSnapshot),
+            determinePropertyMapByType(changeType, _newSnapshot)));
+  }
+
+  @Override
+  public synchronized Collection<String> getAdditionsByType(HelixConstants.ChangeType changeType) {
+    return _addedItems.computeIfAbsent(changeType,
+        changedItems -> getAddedItems(determinePropertyMapByType(changeType, _oldSnapshot),
+            determinePropertyMapByType(changeType, _newSnapshot)));
+  }
+
+  @Override
+  public synchronized Collection<String> getRemovalsByType(HelixConstants.ChangeType changeType) {
+    return _removedItems.computeIfAbsent(changeType,
+        changedItems -> getRemovedItems(determinePropertyMapByType(changeType, _oldSnapshot),
+            determinePropertyMapByType(changeType, _newSnapshot)));
+  }
+
+  /**
+   * @return A map contains all the changed items that are categorized by the change types.
+   */
+  public Map<HelixConstants.ChangeType, Set<String>> getAllChanges() {
+    return getChangeTypes().stream()
+        .collect(Collectors.toMap(changeType -> changeType, changeType -> {
+          Set<String> itemKeys = new HashSet<>();
+          itemKeys.addAll(getAdditionsByType(changeType));
+          itemKeys.addAll(getChangesByType(changeType));
+          itemKeys.addAll(getRemovalsByType(changeType));
+          return itemKeys;
+        })).entrySet().stream().filter(changeEntry -> !changeEntry.getValue().isEmpty()).collect(
+            Collectors
+                .toMap(changeEntry -> changeEntry.getKey(), changeEntry -> changeEntry.getValue()));
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/changedetector/ResourceChangeSnapshot.java b/helix-core/src/main/java/org/apache/helix/controller/changedetector/ResourceChangeSnapshot.java
new file mode 100644
index 0000000..fc8c5c4
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/changedetector/ResourceChangeSnapshot.java
@@ -0,0 +1,157 @@
+package org.apache.helix.controller.changedetector;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import org.apache.helix.HelixConstants;
+import org.apache.helix.ZNRecord;
+import org.apache.helix.controller.dataproviders.ResourceControllerDataProvider;
+import org.apache.helix.model.ClusterConfig;
+import org.apache.helix.model.IdealState;
+import org.apache.helix.model.InstanceConfig;
+import org.apache.helix.model.LiveInstance;
+import org.apache.helix.model.ResourceConfig;
+
+/**
+ * ResourceChangeSnapshot is a POJO that contains the following Helix metadata:
+ * 1. InstanceConfig
+ * 2. IdealState
+ * 3. ResourceConfig
+ * 4. LiveInstance
+ * 5. Changed property types
+ * It serves as a snapshot of the main controller cache to enable the difference (change)
+ * calculation between two rounds of the pipeline run.
+ */
+class ResourceChangeSnapshot {
+
+  private Set<HelixConstants.ChangeType> _changedTypes;
+  private Map<String, InstanceConfig> _instanceConfigMap;
+  private Map<String, IdealState> _idealStateMap;
+  private Map<String, ResourceConfig> _resourceConfigMap;
+  private Map<String, LiveInstance> _liveInstances;
+  private ClusterConfig _clusterConfig;
+
+  /**
+   * Default constructor that constructs an empty snapshot.
+   */
+  ResourceChangeSnapshot() {
+    _changedTypes = new HashSet<>();
+    _instanceConfigMap = new HashMap<>();
+    _idealStateMap = new HashMap<>();
+    _resourceConfigMap = new HashMap<>();
+    _liveInstances = new HashMap<>();
+    _clusterConfig = null;
+  }
+
+  /**
+   * Constructor using controller cache (ResourceControllerDataProvider).
+   *
+   * @param dataProvider
+   * @param ignoreControllerGeneratedFields if true, the snapshot won't record any changes that is
+   *                                        being modified by the controller.
+   */
+  ResourceChangeSnapshot(ResourceControllerDataProvider dataProvider,
+      boolean ignoreControllerGeneratedFields) {
+    _changedTypes = new HashSet<>(dataProvider.getRefreshedChangeTypes());
+    _instanceConfigMap = new HashMap<>(dataProvider.getInstanceConfigMap());
+    _idealStateMap = new HashMap<>(dataProvider.getIdealStates());
+    if (ignoreControllerGeneratedFields && (
+        dataProvider.getClusterConfig().isPersistBestPossibleAssignment() || dataProvider
+            .getClusterConfig().isPersistIntermediateAssignment())) {
+      for (String resourceName : _idealStateMap.keySet()) {
+        _idealStateMap.put(resourceName, trimIdealState(_idealStateMap.get(resourceName)));
+      }
+    }
+    _resourceConfigMap = new HashMap<>(dataProvider.getResourceConfigMap());
+    _liveInstances = new HashMap<>(dataProvider.getLiveInstances());
+    _clusterConfig = dataProvider.getClusterConfig();
+  }
+
+  /**
+   * Copy constructor for ResourceChangeCache.
+   * @param snapshot
+   */
+  ResourceChangeSnapshot(ResourceChangeSnapshot snapshot) {
+    _changedTypes = new HashSet<>(snapshot._changedTypes);
+    _instanceConfigMap = new HashMap<>(snapshot._instanceConfigMap);
+    _idealStateMap = new HashMap<>(snapshot._idealStateMap);
+    _resourceConfigMap = new HashMap<>(snapshot._resourceConfigMap);
+    _liveInstances = new HashMap<>(snapshot._liveInstances);
+    _clusterConfig = snapshot._clusterConfig;
+  }
+
+  Set<HelixConstants.ChangeType> getChangedTypes() {
+    return _changedTypes;
+  }
+
+  Map<String, InstanceConfig> getInstanceConfigMap() {
+    return _instanceConfigMap;
+  }
+
+  Map<String, IdealState> getIdealStateMap() {
+    return _idealStateMap;
+  }
+
+  Map<String, ResourceConfig> getResourceConfigMap() {
+    return _resourceConfigMap;
+  }
+
+  Map<String, LiveInstance> getLiveInstances() {
+    return _liveInstances;
+  }
+
+  ClusterConfig getClusterConfig() {
+    return _clusterConfig;
+  }
+
+  // Trim the IdealState to exclude any controller modified information.
+  private IdealState trimIdealState(IdealState originalIdealState) {
+    // Clone the IdealState to avoid modifying the objects in the Cluster Data Cache, which might
+    // be used by the other stages in the pipeline.
+    IdealState trimmedIdealState = new IdealState(originalIdealState.getRecord());
+    ZNRecord trimmedIdealStateRecord = trimmedIdealState.getRecord();
+    switch (originalIdealState.getRebalanceMode()) {
+      // WARNING: the IdealState copy constructor is not really deep copy. So we should not modify
+      // the values directly or the cached values will be changed.
+      case FULL_AUTO:
+        // For FULL_AUTO resources, both map fields and list fields are not considered as data input
+        // for the controller. The controller will write to these two types of fields for persisting
+        // the assignment mapping.
+        trimmedIdealStateRecord.setListFields(trimmedIdealStateRecord.getListFields().keySet().stream().collect(
+            Collectors.toMap(partition -> partition, partition -> Collections.emptyList())));
+        // Continue to clean up map fields in the SEMI_AUTO case.
+      case SEMI_AUTO:
+        // For SEMI_AUTO resources, map fields are not considered as data input for the controller.
+        // The controller will write to the map fields for persisting the assignment mapping.
+        trimmedIdealStateRecord.setMapFields(trimmedIdealStateRecord.getMapFields().keySet().stream().collect(
+            Collectors.toMap(partition -> partition, partition -> Collections.emptyMap())));
+        break;
+      default:
+        break;
+    }
+    return trimmedIdealState;
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/dataproviders/ResourceControllerDataProvider.java b/helix-core/src/main/java/org/apache/helix/controller/dataproviders/ResourceControllerDataProvider.java
index b1dc215..1631d50 100644
--- a/helix-core/src/main/java/org/apache/helix/controller/dataproviders/ResourceControllerDataProvider.java
+++ b/helix-core/src/main/java/org/apache/helix/controller/dataproviders/ResourceControllerDataProvider.java
@@ -25,6 +25,7 @@ import java.util.List;
 import java.util.Map;
 import java.util.Set;
 
+import java.util.concurrent.ConcurrentHashMap;
 import org.apache.helix.HelixConstants;
 import org.apache.helix.HelixDataAccessor;
 import org.apache.helix.PropertyKey;
@@ -64,6 +65,9 @@ public class ResourceControllerDataProvider extends BaseControllerDataProvider {
   private Map<String, Map<String, MissingTopStateRecord>> _missingTopStateMap;
   private Map<String, Map<String, String>> _lastTopStateLocationMap;
 
+  // Maintain a set of all ChangeTypes for change detection
+  private Set<HelixConstants.ChangeType> _refreshedChangeTypes;
+
   public ResourceControllerDataProvider() {
     this(AbstractDataCache.UNKNOWN_CLUSTER);
   }
@@ -106,19 +110,22 @@ public class ResourceControllerDataProvider extends BaseControllerDataProvider {
     _idealMappingCache = new HashMap<>();
     _missingTopStateMap = new HashMap<>();
     _lastTopStateLocationMap = new HashMap<>();
+    _refreshedChangeTypes = ConcurrentHashMap.newKeySet();
   }
 
   public synchronized void refresh(HelixDataAccessor accessor) {
     long startTime = System.currentTimeMillis();
 
     // Refresh base
-    Set<HelixConstants.ChangeType> propertyRefreshed = super.doRefresh(accessor);
+    Set<HelixConstants.ChangeType> changedTypes = super.doRefresh(accessor);
+    _refreshedChangeTypes.addAll(changedTypes);
 
     // Invalidate cached information if any of the important data has been refreshed
-    if (propertyRefreshed.contains(HelixConstants.ChangeType.IDEAL_STATE)
-        || propertyRefreshed.contains(HelixConstants.ChangeType.LIVE_INSTANCE)
-        || propertyRefreshed.contains(HelixConstants.ChangeType.INSTANCE_CONFIG)
-        || propertyRefreshed.contains(HelixConstants.ChangeType.RESOURCE_CONFIG)) {
+    if (changedTypes.contains(HelixConstants.ChangeType.IDEAL_STATE)
+        || changedTypes.contains(HelixConstants.ChangeType.LIVE_INSTANCE)
+        || changedTypes.contains(HelixConstants.ChangeType.INSTANCE_CONFIG)
+        || changedTypes.contains(HelixConstants.ChangeType.RESOURCE_CONFIG)
+        || changedTypes.contains((HelixConstants.ChangeType.CLUSTER_CONFIG))) {
       clearCachedResourceAssignments();
     }
 
@@ -261,6 +268,23 @@ public class ResourceControllerDataProvider extends BaseControllerDataProvider {
     _idealMappingCache.put(resource, mapping);
   }
 
+  /**
+   * Return the set of all PropertyTypes that changed prior to this round of rebalance. The caller
+   * should clear this set by calling {@link #clearRefreshedChangeTypes()}.
+   * @return
+   */
+  public Set<HelixConstants.ChangeType> getRefreshedChangeTypes() {
+    return _refreshedChangeTypes;
+  }
+
+  /**
+   * Clears the set of all PropertyTypes that changed. The caller will have consumed all change
+   * types by calling {@link #getRefreshedChangeTypes()}.
+   */
+  public void clearRefreshedChangeTypes() {
+    _refreshedChangeTypes.clear();
+  }
+
   public void clearCachedResourceAssignments() {
     _resourceAssignmentCache.clear();
     _idealMappingCache.clear();
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java
index 6ae7076..63870ec 100644
--- a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java
@@ -33,11 +33,10 @@ import org.apache.helix.HelixDefinedState;
 import org.apache.helix.ZNRecord;
 import org.apache.helix.api.config.StateTransitionThrottleConfig;
 import org.apache.helix.controller.dataproviders.ResourceControllerDataProvider;
-import org.apache.helix.controller.rebalancer.util.RebalanceScheduler;
+import org.apache.helix.controller.rebalancer.util.DelayedRebalanceUtil;
 import org.apache.helix.controller.stages.CurrentStateOutput;
 import org.apache.helix.model.ClusterConfig;
 import org.apache.helix.model.IdealState;
-import org.apache.helix.model.InstanceConfig;
 import org.apache.helix.model.Partition;
 import org.apache.helix.model.Resource;
 import org.apache.helix.model.ResourceAssignment;
@@ -51,7 +50,6 @@ import org.slf4j.LoggerFactory;
  */
 public class DelayedAutoRebalancer extends AbstractRebalancer<ResourceControllerDataProvider> {
   private static final Logger LOG = LoggerFactory.getLogger(DelayedAutoRebalancer.class);
-  private static RebalanceScheduler _rebalanceScheduler = new RebalanceScheduler();
 
   @Override
   public IdealState computeNewIdealState(String resourceName,
@@ -80,7 +78,8 @@ public class DelayedAutoRebalancer extends AbstractRebalancer<ResourceController
 
     ClusterConfig clusterConfig = clusterData.getClusterConfig();
     ResourceConfig resourceConfig = clusterData.getResourceConfig(resourceName);
-    boolean delayRebalanceEnabled = isDelayRebalanceEnabled(currentIdealState, clusterConfig);
+    boolean delayRebalanceEnabled =
+        DelayedRebalanceUtil.isDelayRebalanceEnabled(currentIdealState, clusterConfig);
 
     if (resourceConfig != null) {
       userDefinedPreferenceList = resourceConfig.getPreferenceLists();
@@ -111,16 +110,18 @@ public class DelayedAutoRebalancer extends AbstractRebalancer<ResourceController
 
     Set<String> activeNodes = liveEnabledNodes;
     if (delayRebalanceEnabled) {
-      long delay = getRebalanceDelay(currentIdealState, clusterConfig);
-      activeNodes = getActiveInstances(allNodes, currentIdealState, liveEnabledNodes,
-          clusterData.getInstanceOfflineTimeMap(), clusterData.getLiveInstances().keySet(),
-          clusterData.getInstanceConfigMap(), delay, clusterConfig);
+      long delay = DelayedRebalanceUtil.getRebalanceDelay(currentIdealState, clusterConfig);
+      activeNodes = DelayedRebalanceUtil
+          .getActiveNodes(allNodes, currentIdealState, liveEnabledNodes,
+              clusterData.getInstanceOfflineTimeMap(), clusterData.getLiveInstances().keySet(),
+              clusterData.getInstanceConfigMap(), delay, clusterConfig);
 
       Set<String> offlineOrDisabledInstances = new HashSet<>(activeNodes);
       offlineOrDisabledInstances.removeAll(liveEnabledNodes);
-      setRebalanceScheduler(currentIdealState, offlineOrDisabledInstances,
-          clusterData.getInstanceOfflineTimeMap(), clusterData.getLiveInstances().keySet(),
-          clusterData.getInstanceConfigMap(), delay, clusterConfig);
+      DelayedRebalanceUtil.setRebalanceScheduler(currentIdealState.getResourceName(), true,
+          offlineOrDisabledInstances, clusterData.getInstanceOfflineTimeMap(),
+          clusterData.getLiveInstances().keySet(), clusterData.getInstanceConfigMap(), delay,
+          clusterConfig, _manager);
     }
 
     if (allNodes.isEmpty() || activeNodes.isEmpty()) {
@@ -163,16 +164,16 @@ public class DelayedAutoRebalancer extends AbstractRebalancer<ResourceController
         .computePartitionAssignment(allNodeList, liveEnabledNodeList, currentMapping, clusterData);
     ZNRecord finalMapping = newIdealMapping;
 
-    if (isDelayRebalanceEnabled(currentIdealState, clusterConfig)) {
+    if (DelayedRebalanceUtil.isDelayRebalanceEnabled(currentIdealState, clusterConfig)) {
       List<String> activeNodeList = new ArrayList<>(activeNodes);
       Collections.sort(activeNodeList);
-      int minActiveReplicas = getMinActiveReplica(currentIdealState, replicaCount);
+      int minActiveReplicas =
+          DelayedRebalanceUtil.getMinActiveReplica(currentIdealState, replicaCount);
 
       ZNRecord newActiveMapping = _rebalanceStrategy
           .computePartitionAssignment(allNodeList, activeNodeList, currentMapping, clusterData);
-      finalMapping =
-          getFinalDelayedMapping(currentIdealState, newIdealMapping, newActiveMapping, liveEnabledNodes,
-              replicaCount, minActiveReplicas);
+      finalMapping = getFinalDelayedMapping(currentIdealState, newIdealMapping, newActiveMapping,
+          liveEnabledNodes, replicaCount, minActiveReplicas);
     }
 
     finalMapping.getListFields().putAll(userDefinedPreferenceList);
@@ -203,162 +204,15 @@ public class DelayedAutoRebalancer extends AbstractRebalancer<ResourceController
     return newIdealState;
   }
 
-  /* get all active instances (live instances plus offline-yet-active instances */
-  private Set<String> getActiveInstances(Set<String> allNodes, IdealState idealState,
-      Set<String> liveEnabledNodes, Map<String, Long> instanceOfflineTimeMap, Set<String> liveNodes,
-      Map<String, InstanceConfig> instanceConfigMap, long delay, ClusterConfig clusterConfig) {
-    Set<String> activeInstances = new HashSet<>(liveEnabledNodes);
-
-    if (!isDelayRebalanceEnabled(idealState, clusterConfig)) {
-      return activeInstances;
-    }
-
-    Set<String> offlineOrDisabledInstances = new HashSet<>(allNodes);
-    offlineOrDisabledInstances.removeAll(liveEnabledNodes);
-
-    long currentTime = System.currentTimeMillis();
-    for (String ins : offlineOrDisabledInstances) {
-      long inactiveTime = getInactiveTime(ins, liveNodes, instanceOfflineTimeMap.get(ins), delay,
-          instanceConfigMap.get(ins), clusterConfig);
-      InstanceConfig instanceConfig = instanceConfigMap.get(ins);
-      if (inactiveTime > currentTime && instanceConfig != null && instanceConfig
-          .isDelayRebalanceEnabled()) {
-        activeInstances.add(ins);
-      }
-    }
-
-    return activeInstances;
-  }
-
-  /* Set a rebalance scheduler for the closest future rebalance time. */
-  private void setRebalanceScheduler(IdealState idealState, Set<String> offlineOrDisabledInstances,
-      Map<String, Long> instanceOfflineTimeMap, Set<String> liveNodes,
-      Map<String, InstanceConfig> instanceConfigMap,  long delay,
-      ClusterConfig clusterConfig) {
-    String resourceName = idealState.getResourceName();
-    if (!isDelayRebalanceEnabled(idealState, clusterConfig)) {
-      _rebalanceScheduler.removeScheduledRebalance(resourceName);
-      return;
-    }
-
-    long currentTime = System.currentTimeMillis();
-    long nextRebalanceTime = Long.MAX_VALUE;
-    // calculate the closest future rebalance time
-    for (String ins : offlineOrDisabledInstances) {
-      long inactiveTime = getInactiveTime(ins, liveNodes, instanceOfflineTimeMap.get(ins), delay,
-          instanceConfigMap.get(ins), clusterConfig);
-      if (inactiveTime != -1 && inactiveTime > currentTime && inactiveTime < nextRebalanceTime) {
-        nextRebalanceTime = inactiveTime;
-      }
-    }
-
-    if (nextRebalanceTime == Long.MAX_VALUE) {
-      long startTime = _rebalanceScheduler.removeScheduledRebalance(resourceName);
-      if (LOG.isDebugEnabled()) {
-        LOG.debug(String
-            .format("Remove exist rebalance timer for resource %s at %d\n", resourceName, startTime));
-      }
-    } else {
-      long currentScheduledTime = _rebalanceScheduler.getRebalanceTime(resourceName);
-      if (currentScheduledTime < 0 || currentScheduledTime > nextRebalanceTime) {
-        _rebalanceScheduler.scheduleRebalance(_manager, resourceName, nextRebalanceTime);
-        if (LOG.isDebugEnabled()) {
-          LOG.debug(String
-              .format("Set next rebalance time for resource %s at time %d\n", resourceName,
-                  nextRebalanceTime));
-        }
-      }
-    }
-  }
-
-  /**
-   * The time when an offline or disabled instance should be treated as inactive. return -1 if it is
-   * inactive now.
-   *
-   * @return
-   */
-  private long getInactiveTime(String instance, Set<String> liveInstances, Long offlineTime,
-      long delay, InstanceConfig instanceConfig, ClusterConfig clusterConfig) {
-    long inactiveTime = Long.MAX_VALUE;
-
-    // check the time instance went offline.
-    if (!liveInstances.contains(instance)) {
-      if (offlineTime != null && offlineTime > 0 && offlineTime + delay < inactiveTime) {
-        inactiveTime = offlineTime + delay;
-      }
-    }
-
-    // check the time instance got disabled.
-    if (!instanceConfig.getInstanceEnabled() || (clusterConfig.getDisabledInstances() != null
-        && clusterConfig.getDisabledInstances().containsKey(instance))) {
-      long disabledTime = instanceConfig.getInstanceEnabledTime();
-      if (clusterConfig.getDisabledInstances() != null && clusterConfig.getDisabledInstances()
-          .containsKey(instance)) {
-        // Update batch disable time
-        long batchDisableTime = Long.parseLong(clusterConfig.getDisabledInstances().get(instance));
-        if (disabledTime == -1 || disabledTime > batchDisableTime) {
-          disabledTime = batchDisableTime;
-        }
-      }
-      if (disabledTime > 0 && disabledTime + delay < inactiveTime) {
-        inactiveTime = disabledTime + delay;
-      }
-    }
-
-    if (inactiveTime == Long.MAX_VALUE) {
-      return -1;
-    }
-
-    return inactiveTime;
-  }
-
-  private long getRebalanceDelay(IdealState idealState, ClusterConfig clusterConfig) {
-    long delayTime = idealState.getRebalanceDelay();
-    if (delayTime < 0) {
-      delayTime = clusterConfig.getRebalanceDelayTime();
-    }
-    return delayTime;
-  }
-
-  private boolean isDelayRebalanceEnabled(IdealState idealState, ClusterConfig clusterConfig) {
-    long delay = getRebalanceDelay(idealState, clusterConfig);
-    return (delay > 0 && idealState.isDelayRebalanceEnabled() && clusterConfig
-        . isDelayRebalaceEnabled());
-  }
-
   private ZNRecord getFinalDelayedMapping(IdealState idealState, ZNRecord newIdealMapping,
       ZNRecord newActiveMapping, Set<String> liveInstances, int numReplica, int minActiveReplica) {
     if (minActiveReplica >= numReplica) {
       return newIdealMapping;
     }
     ZNRecord finalMapping = new ZNRecord(idealState.getResourceName());
-    for (String partition : newIdealMapping.getListFields().keySet()) {
-      List<String> idealList = newIdealMapping.getListField(partition);
-      List<String> activeList = newActiveMapping.getListField(partition);
-
-      List<String> liveList = new ArrayList<>();
-      int activeReplica = 0;
-      for (String ins : activeList) {
-        if (liveInstances.contains(ins)) {
-          activeReplica++;
-          liveList.add(ins);
-        }
-      }
-
-      if (activeReplica >= minActiveReplica) {
-        finalMapping.setListField(partition, activeList);
-      } else {
-        List<String> candidates = new ArrayList<String>(idealList);
-        candidates.removeAll(activeList);
-        for (String liveIns : candidates) {
-          liveList.add(liveIns);
-          if (liveList.size() >= minActiveReplica) {
-            break;
-          }
-        }
-        finalMapping.setListField(partition, liveList);
-      }
-    }
+    finalMapping.setListFields(DelayedRebalanceUtil
+        .getFinalDelayedMapping(newIdealMapping.getListFields(), newActiveMapping.getListFields(),
+            liveInstances, minActiveReplica));
     return finalMapping;
   }
 
@@ -392,10 +246,11 @@ public class DelayedAutoRebalancer extends AbstractRebalancer<ResourceController
     Set<String> liveNodes = cache.getLiveInstances().keySet();
 
     ClusterConfig clusterConfig = cache.getClusterConfig();
-    long delayTime = getRebalanceDelay(idealState, clusterConfig);
-    Set<String> activeNodes = getActiveInstances(allNodes, idealState, liveNodes,
-        cache.getInstanceOfflineTimeMap(), cache.getLiveInstances().keySet(),
-        cache.getInstanceConfigMap(), delayTime, clusterConfig);
+    long delayTime = DelayedRebalanceUtil.getRebalanceDelay(idealState, clusterConfig);
+    Set<String> activeNodes = DelayedRebalanceUtil
+        .getActiveNodes(allNodes, idealState, liveNodes, cache.getInstanceOfflineTimeMap(),
+            cache.getLiveInstances().keySet(), cache.getInstanceConfigMap(), delayTime,
+            clusterConfig);
 
     String stateModelDefName = idealState.getStateModelDefRef();
     StateModelDefinition stateModelDef = cache.getStateModelDef(stateModelDefName);
@@ -420,14 +275,6 @@ public class DelayedAutoRebalancer extends AbstractRebalancer<ResourceController
     return partitionMapping;
   }
 
-  private int getMinActiveReplica(IdealState idealState, int replicaCount) {
-    int minActiveReplicas = idealState.getMinActiveReplicas();
-    if (minActiveReplicas < 0) {
-      minActiveReplicas = replicaCount;
-    }
-    return minActiveReplicas;
-  }
-
   /**
    * compute best state for resource in AUTO ideal state mode
    * @param liveInstances
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/StatefulRebalancer.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/StatefulRebalancer.java
new file mode 100644
index 0000000..94567bb
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/StatefulRebalancer.java
@@ -0,0 +1,37 @@
+package org.apache.helix.controller.rebalancer;
+
+import java.util.Map;
+
+import org.apache.helix.HelixRebalanceException;
+import org.apache.helix.controller.dataproviders.BaseControllerDataProvider;
+import org.apache.helix.controller.stages.CurrentStateOutput;
+import org.apache.helix.model.IdealState;
+import org.apache.helix.model.Resource;
+
+
+/**
+ * Allows one to come up with custom implementation of a stateful rebalancer.<br/>
+ */
+public interface StatefulRebalancer<T extends BaseControllerDataProvider> {
+
+  /**
+   * Reset the rebalancer to the initial state.
+   */
+  void reset();
+
+  /**
+   * Release all the resources and clean up all the rebalancer state.
+   */
+  void close();
+
+  /**
+   * Compute the new IdealStates for all the input resources. The IdealStates include both new
+   * partition assignment (in the listFiles) and the new replica state mapping (in the mapFields).
+   * @param clusterData The Cluster status data provider.
+   * @param resourceMap A map containing all the rebalancing resources.
+   * @param currentStateOutput The present Current States of the resources.
+   * @return A map of the new IdealStates with the resource name as key.
+   */
+  Map<String, IdealState> computeNewIdealStates(T clusterData, Map<String, Resource> resourceMap,
+      final CurrentStateOutput currentStateOutput) throws HelixRebalanceException;
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/DelayedRebalanceUtil.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/DelayedRebalanceUtil.java
new file mode 100644
index 0000000..1342860
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/DelayedRebalanceUtil.java
@@ -0,0 +1,267 @@
+package org.apache.helix.controller.rebalancer.util;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import org.apache.helix.HelixManager;
+import org.apache.helix.model.ClusterConfig;
+import org.apache.helix.model.IdealState;
+import org.apache.helix.model.InstanceConfig;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+/**
+ * The util for supporting delayed rebalance logic.
+ */
+public class DelayedRebalanceUtil {
+  private static final Logger LOG = LoggerFactory.getLogger(DelayedRebalanceUtil.class);
+
+  private static RebalanceScheduler REBALANCE_SCHEDULER = new RebalanceScheduler();
+
+  /**
+   * @return true if delay rebalance is configured and enabled in the ClusterConfig configurations.
+   */
+  public static boolean isDelayRebalanceEnabled(ClusterConfig clusterConfig) {
+    long delay = clusterConfig.getRebalanceDelayTime();
+    return (delay > 0 && clusterConfig.isDelayRebalaceEnabled());
+  }
+
+  /**
+   * @return true if delay rebalance is configured and enabled in Resource IdealState and the
+   * ClusterConfig configurations.
+   */
+  public static boolean isDelayRebalanceEnabled(IdealState idealState,
+      ClusterConfig clusterConfig) {
+    long delay = getRebalanceDelay(idealState, clusterConfig);
+    return (delay > 0 && idealState.isDelayRebalanceEnabled() && clusterConfig
+        .isDelayRebalaceEnabled());
+  }
+
+  /**
+   * @return the rebalance delay based on Resource IdealState and the ClusterConfig configurations.
+   */
+  public static long getRebalanceDelay(IdealState idealState, ClusterConfig clusterConfig) {
+    long delayTime = idealState.getRebalanceDelay();
+    if (delayTime < 0) {
+      delayTime = clusterConfig.getRebalanceDelayTime();
+    }
+    return delayTime;
+  }
+
+  /**
+   * @return all active nodes (live nodes plus offline-yet-active nodes) while considering cluster
+   * delay rebalance configurations.
+   */
+  public static Set<String> getActiveNodes(Set<String> allNodes, Set<String> liveEnabledNodes,
+      Map<String, Long> instanceOfflineTimeMap, Set<String> liveNodes,
+      Map<String, InstanceConfig> instanceConfigMap, ClusterConfig clusterConfig) {
+    if (!isDelayRebalanceEnabled(clusterConfig)) {
+      return new HashSet<>(liveEnabledNodes);
+    }
+    return getActiveNodes(allNodes, liveEnabledNodes, instanceOfflineTimeMap, liveNodes,
+        instanceConfigMap, clusterConfig.getRebalanceDelayTime(), clusterConfig);
+  }
+
+  /**
+   * @return all active nodes (live nodes plus offline-yet-active nodes) while considering cluster
+   * and the resource delay rebalance configurations.
+   */
+  public static Set<String> getActiveNodes(Set<String> allNodes, IdealState idealState,
+      Set<String> liveEnabledNodes, Map<String, Long> instanceOfflineTimeMap, Set<String> liveNodes,
+      Map<String, InstanceConfig> instanceConfigMap, long delay, ClusterConfig clusterConfig) {
+    if (!isDelayRebalanceEnabled(idealState, clusterConfig)) {
+      return new HashSet<>(liveEnabledNodes);
+    }
+    return getActiveNodes(allNodes, liveEnabledNodes, instanceOfflineTimeMap, liveNodes,
+        instanceConfigMap, delay, clusterConfig);
+  }
+
+  private static Set<String> getActiveNodes(Set<String> allNodes, Set<String> liveEnabledNodes,
+      Map<String, Long> instanceOfflineTimeMap, Set<String> liveNodes,
+      Map<String, InstanceConfig> instanceConfigMap, long delay, ClusterConfig clusterConfig) {
+    Set<String> activeNodes = new HashSet<>(liveEnabledNodes);
+    Set<String> offlineOrDisabledInstances = new HashSet<>(allNodes);
+    offlineOrDisabledInstances.removeAll(liveEnabledNodes);
+    long currentTime = System.currentTimeMillis();
+    for (String ins : offlineOrDisabledInstances) {
+      long inactiveTime = getInactiveTime(ins, liveNodes, instanceOfflineTimeMap.get(ins), delay,
+          instanceConfigMap.get(ins), clusterConfig);
+      InstanceConfig instanceConfig = instanceConfigMap.get(ins);
+      if (inactiveTime > currentTime && instanceConfig != null && instanceConfig
+          .isDelayRebalanceEnabled()) {
+        activeNodes.add(ins);
+      }
+    }
+    return activeNodes;
+  }
+
+  /**
+   * @return The time when an offline or disabled instance should be treated as inactive.
+   * Return -1 if it is inactive now.
+   */
+  private static long getInactiveTime(String instance, Set<String> liveInstances, Long offlineTime,
+      long delay, InstanceConfig instanceConfig, ClusterConfig clusterConfig) {
+    long inactiveTime = Long.MAX_VALUE;
+
+    // check the time instance went offline.
+    if (!liveInstances.contains(instance)) {
+      if (offlineTime != null && offlineTime > 0 && offlineTime + delay < inactiveTime) {
+        inactiveTime = offlineTime + delay;
+      }
+    }
+
+    // check the time instance got disabled.
+    if (!instanceConfig.getInstanceEnabled() || (clusterConfig.getDisabledInstances() != null
+        && clusterConfig.getDisabledInstances().containsKey(instance))) {
+      long disabledTime = instanceConfig.getInstanceEnabledTime();
+      if (clusterConfig.getDisabledInstances() != null && clusterConfig.getDisabledInstances()
+          .containsKey(instance)) {
+        // Update batch disable time
+        long batchDisableTime = Long.parseLong(clusterConfig.getDisabledInstances().get(instance));
+        if (disabledTime == -1 || disabledTime > batchDisableTime) {
+          disabledTime = batchDisableTime;
+        }
+      }
+      if (disabledTime > 0 && disabledTime + delay < inactiveTime) {
+        inactiveTime = disabledTime + delay;
+      }
+    }
+
+    if (inactiveTime == Long.MAX_VALUE) {
+      return -1;
+    }
+
+    return inactiveTime;
+  }
+
+  /**
+   * Merge the new ideal preference list with the delayed mapping that is calculated based on the
+   * delayed rebalance configurations.
+   * The method will prioritize the "active" preference list so as to avoid unnecessary transient
+   * state transitions.
+   *
+   * @param newIdealPreferenceList  the ideal mapping that was calculated based on the current
+   *                                instance status
+   * @param newDelayedPreferenceList the delayed mapping that was calculated based on the delayed
+   *                                 instance status
+   * @param liveEnabledInstances    list of all the nodes that are both alive and enabled.
+   * @param minActiveReplica        the minimum replica count to ensure a valid mapping.
+   *                                If the active list does not have enough replica assignment,
+   *                                this method will fill the list with the new ideal mapping until
+   *                                the replica count satisfies the minimum requirement.
+   * @return the merged state mapping.
+   */
+  public static Map<String, List<String>> getFinalDelayedMapping(
+      Map<String, List<String>> newIdealPreferenceList,
+      Map<String, List<String>> newDelayedPreferenceList, Set<String> liveEnabledInstances,
+      int minActiveReplica) {
+    Map<String, List<String>> finalPreferenceList = new HashMap<>();
+    for (String partition : newIdealPreferenceList.keySet()) {
+      List<String> idealList = newIdealPreferenceList.get(partition);
+      List<String> delayedIdealList = newDelayedPreferenceList.get(partition);
+
+      List<String> liveList = new ArrayList<>();
+      for (String ins : delayedIdealList) {
+        if (liveEnabledInstances.contains(ins)) {
+          liveList.add(ins);
+        }
+      }
+
+      if (liveList.size() >= minActiveReplica) {
+        finalPreferenceList.put(partition, delayedIdealList);
+      } else {
+        List<String> candidates = new ArrayList<>(idealList);
+        candidates.removeAll(delayedIdealList);
+        for (String liveIns : candidates) {
+          liveList.add(liveIns);
+          if (liveList.size() >= minActiveReplica) {
+            break;
+          }
+        }
+        finalPreferenceList.put(partition, liveList);
+      }
+    }
+    return finalPreferenceList;
+  }
+
+  /**
+   * Get the minimum active replica count threshold that allows delayed rebalance.
+   *
+   * @param idealState      the resource Ideal State
+   * @param replicaCount the expected active replica count.
+   * @return the expected minimum active replica count that is required
+   */
+  public static int getMinActiveReplica(IdealState idealState, int replicaCount) {
+    int minActiveReplicas = idealState.getMinActiveReplicas();
+    if (minActiveReplicas < 0) {
+      minActiveReplicas = replicaCount;
+    }
+    return minActiveReplicas;
+  }
+
+  /**
+   * Set a rebalance scheduler for the closest future rebalance time.
+   */
+  public static void setRebalanceScheduler(String resourceName, boolean isDelayedRebalanceEnabled,
+      Set<String> offlineOrDisabledInstances, Map<String, Long> instanceOfflineTimeMap,
+      Set<String> liveNodes, Map<String, InstanceConfig> instanceConfigMap, long delay,
+      ClusterConfig clusterConfig, HelixManager manager) {
+    if (!isDelayedRebalanceEnabled) {
+      REBALANCE_SCHEDULER.removeScheduledRebalance(resourceName);
+      return;
+    }
+
+    long currentTime = System.currentTimeMillis();
+    long nextRebalanceTime = Long.MAX_VALUE;
+    // calculate the closest future rebalance time
+    for (String ins : offlineOrDisabledInstances) {
+      long inactiveTime = getInactiveTime(ins, liveNodes, instanceOfflineTimeMap.get(ins), delay,
+          instanceConfigMap.get(ins), clusterConfig);
+      if (inactiveTime != -1 && inactiveTime > currentTime && inactiveTime < nextRebalanceTime) {
+        nextRebalanceTime = inactiveTime;
+      }
+    }
+
+    if (nextRebalanceTime == Long.MAX_VALUE) {
+      long startTime = REBALANCE_SCHEDULER.removeScheduledRebalance(resourceName);
+      if (LOG.isDebugEnabled()) {
+        LOG.debug(String
+            .format("Remove exist rebalance timer for resource %s at %d\n", resourceName,
+                startTime));
+      }
+    } else {
+      long currentScheduledTime = REBALANCE_SCHEDULER.getRebalanceTime(resourceName);
+      if (currentScheduledTime < 0 || currentScheduledTime > nextRebalanceTime) {
+        REBALANCE_SCHEDULER.scheduleRebalance(manager, resourceName, nextRebalanceTime);
+        if (LOG.isDebugEnabled()) {
+          LOG.debug(String
+              .format("Set next rebalance time for resource %s at time %d\n", resourceName,
+                  nextRebalanceTime));
+        }
+      }
+    }
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/ResourceUsageCalculator.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/ResourceUsageCalculator.java
index c2d472a..e7a1b94 100644
--- a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/ResourceUsageCalculator.java
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/ResourceUsageCalculator.java
@@ -1,11 +1,31 @@
 package org.apache.helix.controller.rebalancer.util;
 
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
 import java.util.HashMap;
 import java.util.Map;
 
 import org.apache.helix.api.rebalancer.constraint.dataprovider.PartitionWeightProvider;
 import org.apache.helix.controller.common.ResourcesStateMap;
 import org.apache.helix.model.Partition;
+import org.apache.helix.model.ResourceAssignment;
 
 public class ResourceUsageCalculator {
   /**
@@ -33,4 +53,176 @@ public class ResourceUsageCalculator {
     }
     return newParticipantUsage;
   }
+
+  /**
+   * Measure baseline divergence between baseline assignment and best possible assignment at
+   * replica level. Example as below:
+   * baseline =
+   * {
+   *    resource1={
+   *       partition1={
+   *          instance1=master,
+   *          instance2=slave
+   *       },
+   *       partition2={
+   *          instance2=slave
+   *       }
+   *    }
+   * }
+   * bestPossible =
+   * {
+   *    resource1={
+   *       partition1={
+   *          instance1=master,  <--- matched
+   *          instance3=slave    <--- doesn't match
+   *       },
+   *       partition2={
+   *          instance3=master   <--- doesn't match
+   *       }
+   *    }
+   * }
+   * baseline divergence = (doesn't match: 2) / (total(matched + doesn't match): 3) = 2/3 ~= 0.66667
+   * If divergence == 1.0, all are different(no match); divergence == 0.0, no difference.
+   *
+   * @param baseline baseline assignment
+   * @param bestPossibleAssignment best possible assignment
+   * @return double value range at [0.0, 1.0]
+   */
+  public static double measureBaselineDivergence(Map<String, ResourceAssignment> baseline,
+      Map<String, ResourceAssignment> bestPossibleAssignment) {
+    int numMatchedReplicas = 0;
+    int numTotalBestPossibleReplicas = 0;
+
+    // 1. Check resource assignment names.
+    for (Map.Entry<String, ResourceAssignment> resourceEntry : bestPossibleAssignment.entrySet()) {
+      String resourceKey = resourceEntry.getKey();
+      if (!baseline.containsKey(resourceKey)) {
+        continue;
+      }
+
+      // Resource assignment names are matched.
+      // 2. check partitions.
+      Map<String, Map<String, String>> bestPossiblePartitions =
+          resourceEntry.getValue().getRecord().getMapFields();
+      Map<String, Map<String, String>> baselinePartitions =
+          baseline.get(resourceKey).getRecord().getMapFields();
+
+      for (Map.Entry<String, Map<String, String>> partitionEntry
+          : bestPossiblePartitions.entrySet()) {
+        String partitionName = partitionEntry.getKey();
+        if (!baselinePartitions.containsKey(partitionName)) {
+          continue;
+        }
+
+        // Partition names are matched.
+        // 3. Check replicas.
+        Map<String, String> bestPossibleReplicas = partitionEntry.getValue();
+        Map<String, String> baselineReplicas = baselinePartitions.get(partitionName);
+
+        for (Map.Entry<String, String> replicaEntry : bestPossibleReplicas.entrySet()) {
+          String replicaName = replicaEntry.getKey();
+          if (!baselineReplicas.containsKey(replicaName)) {
+            continue;
+          }
+
+          // Replica names are matched.
+          // 4. Check replica values.
+          String bestPossibleReplica = replicaEntry.getValue();
+          String baselineReplica = baselineReplicas.get(replicaName);
+          if (bestPossibleReplica.equals(baselineReplica)) {
+            numMatchedReplicas++;
+          }
+        }
+
+        // Count total best possible replicas.
+        numTotalBestPossibleReplicas += bestPossibleReplicas.size();
+      }
+    }
+
+    return numTotalBestPossibleReplicas == 0 ? 1.0d
+        : (1.0d - (double) numMatchedReplicas / (double) numTotalBestPossibleReplicas);
+  }
+
+  /**
+   * Calculates average partition weight per capacity key for a resource config. Example as below:
+   * Input =
+   * {
+   *   "partition1": {
+   *     "capacity1": 20,
+   *     "capacity2": 40
+   *   },
+   *   "partition2": {
+   *     "capacity1": 30,
+   *     "capacity2": 50
+   *   },
+   *   "partition3": {
+   *     "capacity1": 16,
+   *     "capacity2": 30
+   *   }
+   * }
+   *
+   * Total weight for key "capacity1" = 20 + 30 + 16 = 66;
+   * Total weight for key "capacity2" = 40 + 50 + 30 = 120;
+   * Total partitions = 3;
+   * Average partition weight for "capacity1" = 66 / 3 = 22;
+   * Average partition weight for "capacity2" = 120 / 3 = 40;
+   *
+   * Output =
+   * {
+   *   "capacity1": 22,
+   *   "capacity2": 40
+   * }
+   *
+   * @param partitionCapacityMap A map of partition capacity:
+   *        <PartitionName or DEFAULT_PARTITION_KEY, <Capacity Key, Capacity Number>>
+   * @return A map of partition weight: capacity key -> average partition weight
+   */
+  public static Map<String, Integer> calculateAveragePartitionWeight(
+      Map<String, Map<String, Integer>> partitionCapacityMap) {
+    // capacity key -> [number of partitions, total weight per capacity key]
+    Map<String, PartitionWeightCounterEntry> countPartitionWeightMap = new HashMap<>();
+
+    // Aggregates partition weight for each capacity key.
+    partitionCapacityMap.values().forEach(partitionCapacityEntry ->
+        partitionCapacityEntry.forEach((capacityKey, weight) -> countPartitionWeightMap
+            .computeIfAbsent(capacityKey, counterEntry -> new PartitionWeightCounterEntry())
+            .increase(1, weight)));
+
+    // capacity key -> average partition weight
+    Map<String, Integer> averagePartitionWeightMap = new HashMap<>();
+
+    // Calculate average partition weight for each capacity key.
+    // Per capacity key level:
+    // average partition weight = (total partition weight) / (number of partitions)
+    for (Map.Entry<String, PartitionWeightCounterEntry> entry
+        : countPartitionWeightMap.entrySet()) {
+      String capacityKey = entry.getKey();
+      PartitionWeightCounterEntry weightEntry = entry.getValue();
+      int averageWeight = (int) (weightEntry.getWeight() / weightEntry.getPartitions());
+      averagePartitionWeightMap.put(capacityKey, averageWeight);
+    }
+
+    return averagePartitionWeightMap;
+  }
+
+  /*
+   * Represents total number of partitions and total partition weight for a capacity key.
+   */
+  private static class PartitionWeightCounterEntry {
+    private int partitions;
+    private long weight;
+
+    private int getPartitions() {
+      return partitions;
+    }
+
+    private long getWeight() {
+      return weight;
+    }
+
+    private void increase(int partitions, int weight) {
+      this.partitions += partitions;
+      this.weight += weight;
+    }
+  }
 }
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/WagedValidationUtil.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/WagedValidationUtil.java
new file mode 100644
index 0000000..e9f86e7
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/WagedValidationUtil.java
@@ -0,0 +1,91 @@
+package org.apache.helix.controller.rebalancer.util;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.helix.HelixException;
+import org.apache.helix.model.ClusterConfig;
+import org.apache.helix.model.InstanceConfig;
+import org.apache.helix.model.ResourceConfig;
+
+
+/**
+ * A util class that contains validation-related static methods for WAGED rebalancer.
+ */
+public class WagedValidationUtil {
+  /**
+   * Validates and returns instance capacities. The validation logic ensures that all required capacity keys (in ClusterConfig) are present in InstanceConfig.
+   * @param clusterConfig
+   * @param instanceConfig
+   * @return
+   */
+  public static Map<String, Integer> validateAndGetInstanceCapacity(ClusterConfig clusterConfig,
+      InstanceConfig instanceConfig) {
+    // Fetch the capacity of instance from 2 possible sources according to the following priority.
+    // 1. The instance capacity that is configured in the instance config.
+    // 2. If the default instance capacity that is configured in the cluster config contains more capacity keys, fill the capacity map with those additional values.
+    Map<String, Integer> instanceCapacity =
+        new HashMap<>(clusterConfig.getDefaultInstanceCapacityMap());
+    instanceCapacity.putAll(instanceConfig.getInstanceCapacityMap());
+
+    List<String> requiredCapacityKeys = clusterConfig.getInstanceCapacityKeys();
+    // All the required keys must exist in the instance config.
+    if (!instanceCapacity.keySet().containsAll(requiredCapacityKeys)) {
+      throw new HelixException(String.format(
+          "The required capacity keys: %s are not fully configured in the instance: %s, capacity map: %s.",
+          requiredCapacityKeys.toString(), instanceConfig.getInstanceName(),
+          instanceCapacity.toString()));
+    }
+    return instanceCapacity;
+  }
+
+  /**
+   * Validates and returns partition capacities. The validation logic ensures that all required capacity keys (from ClusterConfig) are present in the ResourceConfig for the partition.
+   * @param partitionName
+   * @param resourceConfig
+   * @param clusterConfig
+   * @return
+   */
+  public static Map<String, Integer> validateAndGetPartitionCapacity(String partitionName,
+      ResourceConfig resourceConfig, Map<String, Map<String, Integer>> capacityMap,
+      ClusterConfig clusterConfig) {
+    // Fetch the capacity of partition from 3 possible sources according to the following priority.
+    // 1. The partition capacity that is explicitly configured in the resource config.
+    // 2. Or, the default partition capacity that is configured under partition name DEFAULT_PARTITION_KEY in the resource config.
+    // 3. If the default partition capacity that is configured in the cluster config contains more capacity keys, fill the capacity map with those additional values.
+    Map<String, Integer> partitionCapacity =
+        new HashMap<>(clusterConfig.getDefaultPartitionWeightMap());
+    partitionCapacity.putAll(capacityMap.getOrDefault(partitionName,
+        capacityMap.getOrDefault(ResourceConfig.DEFAULT_PARTITION_KEY, new HashMap<>())));
+
+    List<String> requiredCapacityKeys = clusterConfig.getInstanceCapacityKeys();
+    // If any required capacity key is not configured in the resource config, fail the model creating.
+    if (!partitionCapacity.keySet().containsAll(requiredCapacityKeys)) {
+      throw new HelixException(String.format(
+          "The required capacity keys: %s are not fully configured in the resource: %s, partition: %s, weight map: %s.",
+          requiredCapacityKeys.toString(), resourceConfig.getResourceName(), partitionName,
+          partitionCapacity.toString()));
+    }
+    return partitionCapacity;
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/AssignmentMetadataStore.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/AssignmentMetadataStore.java
new file mode 100644
index 0000000..afd0187
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/AssignmentMetadataStore.java
@@ -0,0 +1,213 @@
+package org.apache.helix.controller.rebalancer.waged;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.I0Itec.zkclient.exception.ZkNoNodeException;
+import org.I0Itec.zkclient.serialize.ZkSerializer;
+import org.apache.helix.BucketDataAccessor;
+import org.apache.helix.HelixException;
+import org.apache.helix.HelixProperty;
+import org.apache.helix.ZNRecord;
+import org.apache.helix.manager.zk.ZNRecordJacksonSerializer;
+import org.apache.helix.manager.zk.ZkBucketDataAccessor;
+import org.apache.helix.model.ResourceAssignment;
+
+
+/**
+ * A placeholder before we have the real assignment metadata store.
+ */
+public class AssignmentMetadataStore {
+  private static final String ASSIGNMENT_METADATA_KEY = "ASSIGNMENT_METADATA";
+  private static final String BASELINE_TEMPLATE = "/%s/%s/BASELINE";
+  private static final String BEST_POSSIBLE_TEMPLATE = "/%s/%s/BEST_POSSIBLE";
+  private static final String BASELINE_KEY = "BASELINE";
+  private static final String BEST_POSSIBLE_KEY = "BEST_POSSIBLE";
+  private static final ZkSerializer SERIALIZER = new ZNRecordJacksonSerializer();
+
+  private BucketDataAccessor _dataAccessor;
+  private String _baselinePath;
+  private String _bestPossiblePath;
+  protected Map<String, ResourceAssignment> _globalBaseline;
+  protected Map<String, ResourceAssignment> _bestPossibleAssignment;
+
+  AssignmentMetadataStore(String metadataStoreAddrs, String clusterName) {
+    this(new ZkBucketDataAccessor(metadataStoreAddrs), clusterName);
+  }
+
+  protected AssignmentMetadataStore(BucketDataAccessor bucketDataAccessor, String clusterName) {
+    _dataAccessor = bucketDataAccessor;
+    _baselinePath = String.format(BASELINE_TEMPLATE, clusterName, ASSIGNMENT_METADATA_KEY);
+    _bestPossiblePath = String.format(BEST_POSSIBLE_TEMPLATE, clusterName, ASSIGNMENT_METADATA_KEY);
+  }
+
+  public synchronized Map<String, ResourceAssignment> getBaseline() {
+    // Return the in-memory baseline. If null, read from ZK. This is to minimize reads from ZK
+    if (_globalBaseline == null) {
+      try {
+        HelixProperty baseline =
+            _dataAccessor.compressedBucketRead(_baselinePath, HelixProperty.class);
+        _globalBaseline = splitAssignments(baseline);
+      } catch (ZkNoNodeException ex) {
+        // Metadata does not exist, so return an empty map
+        _globalBaseline = Collections.emptyMap();
+      }
+    }
+    return _globalBaseline;
+  }
+
+  public synchronized Map<String, ResourceAssignment> getBestPossibleAssignment() {
+    // Return the in-memory baseline. If null, read from ZK. This is to minimize reads from ZK
+    if (_bestPossibleAssignment == null) {
+      try {
+        HelixProperty baseline =
+            _dataAccessor.compressedBucketRead(_bestPossiblePath, HelixProperty.class);
+        _bestPossibleAssignment = splitAssignments(baseline);
+      } catch (ZkNoNodeException ex) {
+        // Metadata does not exist, so return an empty map
+        _bestPossibleAssignment = Collections.emptyMap();
+      }
+    }
+    return _bestPossibleAssignment;
+  }
+
+  /**
+   * @return true if a new baseline was persisted.
+   * @throws HelixException if the method failed to persist the baseline.
+   */
+  // TODO: Enhance the return value so it is more intuitive to understand when the persist fails and
+  // TODO: when it is skipped.
+  public synchronized boolean persistBaseline(Map<String, ResourceAssignment> globalBaseline) {
+    // TODO: Make the write async?
+    // If baseline hasn't changed, skip writing to metadata store
+    if (compareAssignments(_globalBaseline, globalBaseline)) {
+      return false;
+    }
+    // Persist to ZK
+    HelixProperty combinedAssignments = combineAssignments(BASELINE_KEY, globalBaseline);
+    try {
+      _dataAccessor.compressedBucketWrite(_baselinePath, combinedAssignments);
+    } catch (IOException e) {
+      // TODO: Improve failure handling
+      throw new HelixException("Failed to persist baseline!", e);
+    }
+
+    // Update the in-memory reference
+    _globalBaseline = globalBaseline;
+    return true;
+  }
+
+  /**
+   * @return true if a new best possible assignment was persisted.
+   * @throws HelixException if the method failed to persist the baseline.
+   */
+  // TODO: Enhance the return value so it is more intuitive to understand when the persist fails and
+  // TODO: when it is skipped.
+  public synchronized boolean persistBestPossibleAssignment(
+      Map<String, ResourceAssignment> bestPossibleAssignment) {
+    // TODO: Make the write async?
+    // If bestPossibleAssignment hasn't changed, skip writing to metadata store
+    if (compareAssignments(_bestPossibleAssignment, bestPossibleAssignment)) {
+      return false;
+    }
+    // Persist to ZK
+    HelixProperty combinedAssignments =
+        combineAssignments(BEST_POSSIBLE_KEY, bestPossibleAssignment);
+    try {
+      _dataAccessor.compressedBucketWrite(_bestPossiblePath, combinedAssignments);
+    } catch (IOException e) {
+      // TODO: Improve failure handling
+      throw new HelixException("Failed to persist BestPossibleAssignment!", e);
+    }
+
+    // Update the in-memory reference
+    _bestPossibleAssignment = bestPossibleAssignment;
+    return true;
+  }
+
+  protected synchronized void reset() {
+    if (_bestPossibleAssignment != null) {
+      _bestPossibleAssignment.clear();
+      _bestPossibleAssignment = null;
+    }
+    if (_globalBaseline != null) {
+      _globalBaseline.clear();
+      _globalBaseline = null;
+    }
+  }
+
+  protected void finalize() {
+    // To ensure all resources are released.
+    close();
+  }
+
+  // Close to release all the resources.
+  public void close() {
+    _dataAccessor.disconnect();
+  }
+
+  /**
+   * Produces one HelixProperty that contains all assignment data.
+   * @param name
+   * @param assignmentMap
+   * @return
+   */
+  private HelixProperty combineAssignments(String name,
+      Map<String, ResourceAssignment> assignmentMap) {
+    HelixProperty property = new HelixProperty(name);
+    // Add each resource's assignment as a simple field in one ZNRecord
+    // Node that don't use Arrays.toString() for the record converting. The deserialize will fail.
+    assignmentMap.forEach((resource, assignment) -> property.getRecord()
+        .setSimpleField(resource, new String(SERIALIZER.serialize(assignment.getRecord()))));
+    return property;
+  }
+
+  /**
+   * Returns a Map of (ResourceName, ResourceAssignment) pairs.
+   * @param property
+   * @return
+   */
+  private Map<String, ResourceAssignment> splitAssignments(HelixProperty property) {
+    Map<String, ResourceAssignment> assignmentMap = new HashMap<>();
+    // Convert each resource's assignment String into a ResourceAssignment object and put it in a
+    // map
+    property.getRecord().getSimpleFields().forEach((resource, assignmentStr) -> assignmentMap
+        .put(resource,
+            new ResourceAssignment((ZNRecord) SERIALIZER.deserialize(assignmentStr.getBytes()))));
+    return assignmentMap;
+  }
+
+  /**
+   * Returns whether two assignments are same.
+   * @param oldAssignment
+   * @param newAssignment
+   * @return true if they are the same. False otherwise or oldAssignment is null
+   */
+  protected boolean compareAssignments(Map<String, ResourceAssignment> oldAssignment,
+      Map<String, ResourceAssignment> newAssignment) {
+    // If oldAssignment is null, that means that we haven't read from/written to
+    // the metadata store yet. In that case, we return false so that we write to metadata store.
+    return oldAssignment != null && oldAssignment.equals(newAssignment);
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/RebalanceAlgorithm.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/RebalanceAlgorithm.java
new file mode 100644
index 0000000..1374162
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/RebalanceAlgorithm.java
@@ -0,0 +1,43 @@
+package org.apache.helix.controller.rebalancer.waged;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.HelixRebalanceException;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterModel;
+import org.apache.helix.controller.rebalancer.waged.model.OptimalAssignment;
+
+/**
+ * A generic interface to generate the optimal assignment given the runtime cluster environment.
+ *
+ * <pre>
+ * @see <a href="https://github.com/apache/helix/wiki/
+ * Design-Proposal---Weight-Aware-Globally-Even-Distribute-Rebalancer
+ * #rebalance-algorithm-adapter">Rebalance Algorithm</a>
+ * </pre>
+ */
+public interface RebalanceAlgorithm {
+
+  /**
+   * Rebalance the Helix resource partitions based on the input cluster model.
+   * @param clusterModel The run time cluster model that contains all necessary information
+   * @return An instance of {@link OptimalAssignment}
+   */
+  OptimalAssignment calculate(ClusterModel clusterModel) throws HelixRebalanceException;
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedRebalancer.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedRebalancer.java
new file mode 100644
index 0000000..8a21bbb
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedRebalancer.java
@@ -0,0 +1,787 @@
+package org.apache.helix.controller.rebalancer.waged;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.stream.Collectors;
+
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.ImmutableSet;
+import org.apache.helix.HelixConstants;
+import org.apache.helix.HelixManager;
+import org.apache.helix.HelixRebalanceException;
+import org.apache.helix.controller.changedetector.ResourceChangeDetector;
+import org.apache.helix.controller.dataproviders.ResourceControllerDataProvider;
+import org.apache.helix.controller.rebalancer.DelayedAutoRebalancer;
+import org.apache.helix.controller.rebalancer.StatefulRebalancer;
+import org.apache.helix.controller.rebalancer.internal.MappingCalculator;
+import org.apache.helix.controller.rebalancer.util.DelayedRebalanceUtil;
+import org.apache.helix.controller.rebalancer.waged.constraints.ConstraintBasedAlgorithmFactory;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterModel;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterModelProvider;
+import org.apache.helix.controller.rebalancer.waged.model.OptimalAssignment;
+import org.apache.helix.controller.stages.CurrentStateOutput;
+import org.apache.helix.model.ClusterConfig;
+import org.apache.helix.model.IdealState;
+import org.apache.helix.model.Partition;
+import org.apache.helix.model.Resource;
+import org.apache.helix.model.ResourceAssignment;
+import org.apache.helix.model.ResourceConfig;
+import org.apache.helix.monitoring.metrics.MetricCollector;
+import org.apache.helix.monitoring.metrics.WagedRebalancerMetricCollector;
+import org.apache.helix.monitoring.metrics.implementation.BaselineDivergenceGauge;
+import org.apache.helix.monitoring.metrics.model.CountMetric;
+import org.apache.helix.monitoring.metrics.model.LatencyMetric;
+import org.apache.helix.util.RebalanceUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Weight-Aware Globally-Even Distribute Rebalancer.
+ * @see <a
+ *      href="https://github.com/apache/helix/wiki/Design-Proposal---Weight-Aware-Globally-Even-Distribute-Rebalancer">
+ *      Design Document
+ *      </a>
+ */
+public class WagedRebalancer implements StatefulRebalancer<ResourceControllerDataProvider> {
+  private static final Logger LOG = LoggerFactory.getLogger(WagedRebalancer.class);
+
+  // When any of the following change happens, the rebalancer needs to do a global rebalance which
+  // contains 1. baseline recalculate, 2. partial rebalance that is based on the new baseline.
+  private static final Set<HelixConstants.ChangeType> GLOBAL_REBALANCE_REQUIRED_CHANGE_TYPES =
+      ImmutableSet
+          .of(HelixConstants.ChangeType.RESOURCE_CONFIG, HelixConstants.ChangeType.IDEAL_STATE,
+              HelixConstants.ChangeType.CLUSTER_CONFIG, HelixConstants.ChangeType.INSTANCE_CONFIG);
+  // To identify if the preference has been configured or not.
+  private static final Map<ClusterConfig.GlobalRebalancePreferenceKey, Integer>
+      NOT_CONFIGURED_PREFERENCE = ImmutableMap
+      .of(ClusterConfig.GlobalRebalancePreferenceKey.EVENNESS, -1,
+          ClusterConfig.GlobalRebalancePreferenceKey.LESS_MOVEMENT, -1);
+  // The default algorithm to use when there is no preference configured.
+  private static final RebalanceAlgorithm DEFAULT_REBALANCE_ALGORITHM =
+      ConstraintBasedAlgorithmFactory
+          .getInstance(ClusterConfig.DEFAULT_GLOBAL_REBALANCE_PREFERENCE);
+
+  // To calculate the baseline asynchronously
+  private final ExecutorService _baselineCalculateExecutor;
+  private final ResourceChangeDetector _changeDetector;
+  private final HelixManager _manager;
+  private final MappingCalculator<ResourceControllerDataProvider> _mappingCalculator;
+  private final AssignmentMetadataStore _assignmentMetadataStore;
+
+  private final MetricCollector _metricCollector;
+  private final CountMetric _rebalanceFailureCount;
+  private final CountMetric _baselineCalcCounter;
+  private final LatencyMetric _baselineCalcLatency;
+  private final LatencyMetric _writeLatency;
+  private final CountMetric _partialRebalanceCounter;
+  private final LatencyMetric _partialRebalanceLatency;
+  private final LatencyMetric _stateReadLatency;
+  private final BaselineDivergenceGauge _baselineDivergenceGauge;
+
+  private boolean _asyncGlobalRebalanceEnabled;
+
+  // Note, the rebalance algorithm field is mutable so it should not be directly referred except for
+  // the public method computeNewIdealStates.
+  private RebalanceAlgorithm _rebalanceAlgorithm;
+  private Map<ClusterConfig.GlobalRebalancePreferenceKey, Integer> _preference =
+      NOT_CONFIGURED_PREFERENCE;
+
+  private static AssignmentMetadataStore constructAssignmentStore(String metadataStoreAddrs,
+      String clusterName) {
+    if (metadataStoreAddrs != null && clusterName != null) {
+      return new AssignmentMetadataStore(metadataStoreAddrs, clusterName);
+    }
+    return null;
+  }
+
+  public WagedRebalancer(HelixManager helixManager) {
+    this(helixManager == null ? null
+            : constructAssignmentStore(helixManager.getMetadataStoreConnectionString(),
+                helixManager.getClusterName()),
+        DEFAULT_REBALANCE_ALGORITHM,
+        // Use DelayedAutoRebalancer as the mapping calculator for the final assignment output.
+        // Mapping calculator will translate the best possible assignment into the applicable state
+        // mapping based on the current states.
+        // TODO abstract and separate the main mapping calculator logic from DelayedAutoRebalancer
+        new DelayedAutoRebalancer(),
+        // Helix Manager is required for the rebalancer scheduler
+        helixManager,
+        // If HelixManager is null, we just pass in a non-functioning WagedRebalancerMetricCollector
+        // that will not be registered to MBean.
+        // This is to handle two cases: 1. HelixManager is null for non-testing cases. In this case,
+        // WagedRebalancer will not read/write to metadata store and just use CurrentState-based
+        // rebalancing. 2. Tests that require instrumenting the rebalancer for verifying whether the
+        // cluster has converged.
+        helixManager == null ? new WagedRebalancerMetricCollector()
+            : new WagedRebalancerMetricCollector(helixManager.getClusterName()),
+        ClusterConfig.DEFAULT_GLOBAL_REBALANCE_ASYNC_MODE_ENABLED);
+    _preference = ImmutableMap.copyOf(ClusterConfig.DEFAULT_GLOBAL_REBALANCE_PREFERENCE);
+  }
+
+  /**
+   * This constructor will use null for HelixManager. With null HelixManager, the rebalancer will
+   * not schedule for a future delayed rebalance.
+   * @param assignmentMetadataStore
+   * @param algorithm
+   * @param metricCollectorOptional
+   */
+  protected WagedRebalancer(AssignmentMetadataStore assignmentMetadataStore,
+      RebalanceAlgorithm algorithm, Optional<MetricCollector> metricCollectorOptional) {
+    this(assignmentMetadataStore, algorithm, new DelayedAutoRebalancer(), null,
+        // If metricCollector is not provided, instantiate a version that does not register metrics
+        // in order to allow rebalancer to proceed
+        metricCollectorOptional.orElse(new WagedRebalancerMetricCollector()),
+        false);
+  }
+
+  private WagedRebalancer(AssignmentMetadataStore assignmentMetadataStore,
+      RebalanceAlgorithm algorithm, MappingCalculator mappingCalculator, HelixManager manager,
+      MetricCollector metricCollector, boolean isAsyncGlobalRebalanceEnabled) {
+    if (assignmentMetadataStore == null) {
+      LOG.warn("Assignment Metadata Store is not configured properly."
+          + " The rebalancer will not access the assignment store during the rebalance.");
+    }
+    _assignmentMetadataStore = assignmentMetadataStore;
+    _rebalanceAlgorithm = algorithm;
+    _mappingCalculator = mappingCalculator;
+    if (manager == null) {
+      LOG.warn("HelixManager is not provided. The rebalancer is not going to schedule for a future "
+          + "rebalance even when delayed rebalance is enabled.");
+    }
+    _manager = manager;
+
+    _metricCollector = metricCollector;
+    _rebalanceFailureCount = _metricCollector.getMetric(
+        WagedRebalancerMetricCollector.WagedRebalancerMetricNames.RebalanceFailureCounter.name(),
+        CountMetric.class);
+    _baselineCalcCounter = _metricCollector.getMetric(
+        WagedRebalancerMetricCollector.WagedRebalancerMetricNames.GlobalBaselineCalcCounter.name(),
+        CountMetric.class);
+    _baselineCalcLatency = _metricCollector.getMetric(
+        WagedRebalancerMetricCollector.WagedRebalancerMetricNames.GlobalBaselineCalcLatencyGauge
+            .name(),
+        LatencyMetric.class);
+    _partialRebalanceCounter = _metricCollector.getMetric(
+        WagedRebalancerMetricCollector.WagedRebalancerMetricNames.PartialRebalanceCounter.name(),
+        CountMetric.class);
+    _partialRebalanceLatency = _metricCollector.getMetric(
+        WagedRebalancerMetricCollector.WagedRebalancerMetricNames.PartialRebalanceLatencyGauge
+            .name(),
+        LatencyMetric.class);
+    _writeLatency = _metricCollector.getMetric(
+        WagedRebalancerMetricCollector.WagedRebalancerMetricNames.StateWriteLatencyGauge.name(),
+        LatencyMetric.class);
+    _stateReadLatency = _metricCollector.getMetric(
+        WagedRebalancerMetricCollector.WagedRebalancerMetricNames.StateReadLatencyGauge.name(),
+        LatencyMetric.class);
+    _baselineDivergenceGauge = _metricCollector.getMetric(
+        WagedRebalancerMetricCollector.WagedRebalancerMetricNames.BaselineDivergenceGauge.name(),
+        BaselineDivergenceGauge.class);
+
+    _changeDetector = new ResourceChangeDetector(true);
+
+    _baselineCalculateExecutor = Executors.newSingleThreadExecutor();
+    _asyncGlobalRebalanceEnabled = isAsyncGlobalRebalanceEnabled;
+  }
+
+  // Update the global rebalance mode to be asynchronous or synchronous
+  public void setGlobalRebalanceAsyncMode(boolean isAsyncGlobalRebalanceEnabled) {
+    _asyncGlobalRebalanceEnabled = isAsyncGlobalRebalanceEnabled;
+  }
+
+  // Update the rebalancer preference if the new options are different from the current preference.
+  public synchronized void updateRebalancePreference(
+      Map<ClusterConfig.GlobalRebalancePreferenceKey, Integer> newPreference) {
+    // 1. if the preference was not configured during constructing, no need to update.
+    // 2. if the preference equals to the new preference, no need to update.
+    if (!_preference.equals(NOT_CONFIGURED_PREFERENCE) && !_preference.equals(newPreference)) {
+      _rebalanceAlgorithm = ConstraintBasedAlgorithmFactory.getInstance(newPreference);
+      _preference = ImmutableMap.copyOf(newPreference);
+    }
+  }
+
+  @Override
+  public void reset() {
+    if (_assignmentMetadataStore != null) {
+      _assignmentMetadataStore.reset();
+    }
+    _changeDetector.resetSnapshots();
+  }
+
+  // TODO the rebalancer should reject any other computing request after being closed.
+  @Override
+  public void close() {
+    if (_baselineCalculateExecutor != null) {
+      _baselineCalculateExecutor.shutdownNow();
+    }
+    if (_assignmentMetadataStore != null) {
+      _assignmentMetadataStore.close();
+    }
+    _metricCollector.unregister();
+  }
+
+  @Override
+  public Map<String, IdealState> computeNewIdealStates(ResourceControllerDataProvider clusterData,
+      Map<String, Resource> resourceMap, final CurrentStateOutput currentStateOutput)
+      throws HelixRebalanceException {
+    if (resourceMap.isEmpty()) {
+      LOG.warn("There is no resource to be rebalanced by {}", this.getClass().getSimpleName());
+      return Collections.emptyMap();
+    }
+
+    LOG.info("Start computing new ideal states for resources: {}", resourceMap.keySet().toString());
+    validateInput(clusterData, resourceMap);
+
+    Map<String, IdealState> newIdealStates;
+    try {
+      // Calculate the target assignment based on the current cluster status.
+      newIdealStates = computeBestPossibleStates(clusterData, resourceMap, currentStateOutput,
+          _rebalanceAlgorithm);
+    } catch (HelixRebalanceException ex) {
+      LOG.error("Failed to calculate the new assignments.", ex);
+      // Record the failure in metrics.
+      _rebalanceFailureCount.increment(1L);
+
+      HelixRebalanceException.Type failureType = ex.getFailureType();
+      if (failureType.equals(HelixRebalanceException.Type.INVALID_REBALANCER_STATUS) || failureType
+          .equals(HelixRebalanceException.Type.UNKNOWN_FAILURE)) {
+        // If the failure is unknown or because of assignment store access failure, throw the
+        // rebalance exception.
+        throw ex;
+      } else { // return the previously calculated assignment.
+        LOG.warn(
+            "Returning the last known-good best possible assignment from metadata store due to "
+                + "rebalance failure of type: {}", failureType);
+        // Note that don't return an assignment based on the current state if there is no previously
+        // calculated result in this fallback logic.
+        Map<String, ResourceAssignment> assignmentRecord =
+            getBestPossibleAssignment(_assignmentMetadataStore, new CurrentStateOutput(),
+                resourceMap.keySet());
+        newIdealStates = convertResourceAssignment(clusterData, assignmentRecord);
+      }
+    }
+
+    // Construct the new best possible states according to the current state and target assignment.
+    // Note that the new ideal state might be an intermediate state between the current state and
+    // the target assignment.
+    newIdealStates.values().parallelStream().forEach(idealState -> {
+      String resourceName = idealState.getResourceName();
+      // Adjust the states according to the current state.
+      ResourceAssignment finalAssignment = _mappingCalculator
+          .computeBestPossiblePartitionState(clusterData, idealState, resourceMap.get(resourceName),
+              currentStateOutput);
+
+      // Clean up the state mapping fields. Use the final assignment that is calculated by the
+      // mapping calculator to replace them.
+      idealState.getRecord().getMapFields().clear();
+      for (Partition partition : finalAssignment.getMappedPartitions()) {
+        Map<String, String> newStateMap = finalAssignment.getReplicaMap(partition);
+        // if the final states cannot be generated, override the best possible state with empty map.
+        idealState.setInstanceStateMap(partition.getPartitionName(),
+            newStateMap == null ? Collections.emptyMap() : newStateMap);
+      }
+    });
+    LOG.info("Finish computing new ideal states for resources: {}",
+        resourceMap.keySet().toString());
+    return newIdealStates;
+  }
+
+  // Coordinate global rebalance and partial rebalance according to the cluster changes.
+  private Map<String, IdealState> computeBestPossibleStates(
+      ResourceControllerDataProvider clusterData, Map<String, Resource> resourceMap,
+      final CurrentStateOutput currentStateOutput, RebalanceAlgorithm algorithm)
+      throws HelixRebalanceException {
+    Set<String> activeNodes = DelayedRebalanceUtil
+        .getActiveNodes(clusterData.getAllInstances(), clusterData.getEnabledLiveInstances(),
+            clusterData.getInstanceOfflineTimeMap(), clusterData.getLiveInstances().keySet(),
+            clusterData.getInstanceConfigMap(), clusterData.getClusterConfig());
+
+    // Schedule (or unschedule) delayed rebalance according to the delayed rebalance config.
+    delayedRebalanceSchedule(clusterData, activeNodes, resourceMap.keySet());
+
+    Map<String, IdealState> newIdealStates = convertResourceAssignment(clusterData,
+        computeBestPossibleAssignment(clusterData, resourceMap, activeNodes, currentStateOutput,
+            algorithm));
+
+    // The additional rebalance overwrite is required since the calculated mapping may contain
+    // some delayed rebalanced assignments.
+    if (!activeNodes.equals(clusterData.getEnabledLiveInstances())) {
+      applyRebalanceOverwrite(newIdealStates, clusterData, resourceMap,
+          getBaselineAssignment(_assignmentMetadataStore, currentStateOutput, resourceMap.keySet()),
+          algorithm);
+    }
+    // Replace the assignment if user-defined preference list is configured.
+    // Note the user-defined list is intentionally applied to the final mapping after calculation.
+    // This is to avoid persisting it into the assignment store, which impacts the long term
+    // assignment evenness and partition movements.
+    newIdealStates.entrySet().stream().forEach(idealStateEntry -> applyUserDefinedPreferenceList(
+        clusterData.getResourceConfig(idealStateEntry.getKey()), idealStateEntry.getValue()));
+
+    return newIdealStates;
+  }
+
+  // Coordinate global rebalance and partial rebalance according to the cluster changes.
+  protected Map<String, ResourceAssignment> computeBestPossibleAssignment(
+      ResourceControllerDataProvider clusterData, Map<String, Resource> resourceMap,
+      Set<String> activeNodes, final CurrentStateOutput currentStateOutput,
+      RebalanceAlgorithm algorithm)
+      throws HelixRebalanceException {
+    // Perform global rebalance for a new baseline assignment
+    globalRebalance(clusterData, resourceMap, currentStateOutput, algorithm);
+    // Perform partial rebalance for a new best possible assignment
+    Map<String, ResourceAssignment> newAssignment =
+        partialRebalance(clusterData, resourceMap, activeNodes, currentStateOutput, algorithm);
+    return newAssignment;
+  }
+
+  /**
+   * Convert the resource assignment map into an IdealState map.
+   */
+  private Map<String, IdealState> convertResourceAssignment(
+      ResourceControllerDataProvider clusterData, Map<String, ResourceAssignment> assignments)
+      throws HelixRebalanceException {
+    // Convert the assignments into IdealState for the following state mapping calculation.
+    Map<String, IdealState> finalIdealStateMap = new HashMap<>();
+    for (String resourceName : assignments.keySet()) {
+      try {
+        IdealState currentIdealState = clusterData.getIdealState(resourceName);
+        Map<String, Integer> statePriorityMap =
+            clusterData.getStateModelDef(currentIdealState.getStateModelDefRef())
+                .getStatePriorityMap();
+        // Create a new IdealState instance which contains the new calculated assignment in the
+        // preference list.
+        IdealState newIdealState = new IdealState(resourceName);
+        // Copy the simple fields
+        newIdealState.getRecord().setSimpleFields(currentIdealState.getRecord().getSimpleFields());
+        // Sort the preference list according to state priority.
+        newIdealState.setPreferenceLists(
+            getPreferenceLists(assignments.get(resourceName), statePriorityMap));
+        // Note the state mapping in the new assignment won't directly propagate to the map fields.
+        // The rebalancer will calculate for the final state mapping considering the current states.
+        finalIdealStateMap.put(resourceName, newIdealState);
+      } catch (Exception ex) {
+        throw new HelixRebalanceException(
+            "Failed to calculate the new IdealState for resource: " + resourceName,
+            HelixRebalanceException.Type.INVALID_CLUSTER_STATUS, ex);
+      }
+    }
+    return finalIdealStateMap;
+  }
+
+  /**
+   * Global rebalance calculates for a new baseline assignment.
+   * The new baseline assignment will be persisted and leveraged by the partial rebalance.
+   * @param clusterData
+   * @param resourceMap
+   * @param currentStateOutput
+   * @param algorithm
+   * @throws HelixRebalanceException
+   */
+  private void globalRebalance(ResourceControllerDataProvider clusterData,
+      Map<String, Resource> resourceMap, final CurrentStateOutput currentStateOutput,
+      RebalanceAlgorithm algorithm)
+      throws HelixRebalanceException {
+    _changeDetector.updateSnapshots(clusterData);
+    // Get all the changed items' information. Filter for the items that have content changed.
+    final Map<HelixConstants.ChangeType, Set<String>> clusterChanges =
+        _changeDetector.getAllChanges();
+
+    if (clusterChanges.keySet().stream()
+        .anyMatch(GLOBAL_REBALANCE_REQUIRED_CHANGE_TYPES::contains)) {
+      // Build the cluster model for rebalance calculation.
+      // Note, for a Baseline calculation,
+      // 1. Ignore node status (disable/offline).
+      // 2. Use the previous Baseline as the only parameter about the previous assignment.
+      Map<String, ResourceAssignment> currentBaseline =
+          getBaselineAssignment(_assignmentMetadataStore, currentStateOutput, resourceMap.keySet());
+      ClusterModel clusterModel;
+      try {
+        clusterModel = ClusterModelProvider
+            .generateClusterModelForBaseline(clusterData, resourceMap,
+                clusterData.getAllInstances(), clusterChanges, currentBaseline);
+      } catch (Exception ex) {
+        throw new HelixRebalanceException("Failed to generate cluster model for global rebalance.",
+            HelixRebalanceException.Type.INVALID_CLUSTER_STATUS, ex);
+      }
+
+      final boolean waitForGlobalRebalance = !_asyncGlobalRebalanceEnabled;
+      final String clusterName = clusterData.getClusterName();
+      // Calculate the Baseline assignment for global rebalance.
+      Future<Boolean> result = _baselineCalculateExecutor.submit(() -> {
+        try {
+          // Note that we should schedule a new partial rebalance for a future rebalance pipeline if
+          // the planned partial rebalance in the current rebalance pipeline won't wait for the new
+          // baseline being calculated.
+          // So set shouldSchedulePartialRebalance to be !waitForGlobalRebalance
+          calculateAndUpdateBaseline(clusterModel, algorithm, !waitForGlobalRebalance, clusterName);
+        } catch (HelixRebalanceException e) {
+          LOG.error("Failed to calculate baseline assignment!", e);
+          return false;
+        }
+        return true;
+      });
+      if (waitForGlobalRebalance) {
+        try {
+          if (!result.get()) {
+            throw new HelixRebalanceException("Failed to calculate for the new Baseline.",
+                HelixRebalanceException.Type.FAILED_TO_CALCULATE);
+          }
+        } catch (InterruptedException | ExecutionException e) {
+          throw new HelixRebalanceException("Failed to execute new Baseline calculation.",
+              HelixRebalanceException.Type.FAILED_TO_CALCULATE, e);
+        }
+      }
+    }
+  }
+
+  /**
+   * Calculate and update the Baseline assignment
+   * @param clusterModel
+   * @param algorithm
+   * @param shouldSchedulePartialRebalance True if the call should trigger a following partial rebalance
+   *                                   so the new Baseline could be applied to cluster.
+   * @param clusterName
+   * @throws HelixRebalanceException
+   */
+  private void calculateAndUpdateBaseline(ClusterModel clusterModel, RebalanceAlgorithm algorithm,
+      boolean shouldSchedulePartialRebalance, String clusterName)
+      throws HelixRebalanceException {
+    LOG.info("Start calculating the new baseline.");
+    _baselineCalcCounter.increment(1L);
+    _baselineCalcLatency.startMeasuringLatency();
+
+    boolean isBaselineChanged = false;
+    Map<String, ResourceAssignment> newBaseline = calculateAssignment(clusterModel, algorithm);
+    // Write the new baseline to metadata store
+    if (_assignmentMetadataStore != null) {
+      try {
+        _writeLatency.startMeasuringLatency();
+        isBaselineChanged = _assignmentMetadataStore.persistBaseline(newBaseline);
+        _writeLatency.endMeasuringLatency();
+      } catch (Exception ex) {
+        throw new HelixRebalanceException("Failed to persist the new baseline assignment.",
+            HelixRebalanceException.Type.INVALID_REBALANCER_STATUS, ex);
+      }
+    } else {
+      LOG.debug("Assignment Metadata Store is null. Skip persisting the baseline assignment.");
+    }
+    _baselineCalcLatency.endMeasuringLatency();
+    LOG.info("Global baseline calculation completed and has been persisted into metadata store.");
+
+    if (isBaselineChanged && shouldSchedulePartialRebalance) {
+      LOG.info("Schedule a new rebalance after the new baseline calculation has finished.");
+      RebalanceUtil.scheduleOnDemandPipeline(clusterName, 0L, false);
+    }
+  }
+
+  private Map<String, ResourceAssignment> partialRebalance(
+      ResourceControllerDataProvider clusterData, Map<String, Resource> resourceMap,
+      Set<String> activeNodes, final CurrentStateOutput currentStateOutput,
+      RebalanceAlgorithm algorithm)
+      throws HelixRebalanceException {
+    LOG.info("Start calculating the new best possible assignment.");
+    _partialRebalanceCounter.increment(1L);
+    _partialRebalanceLatency.startMeasuringLatency();
+    // TODO: Consider combining the metrics for both baseline/best possible?
+    // Read the baseline from metadata store
+    Map<String, ResourceAssignment> currentBaseline =
+        getBaselineAssignment(_assignmentMetadataStore, currentStateOutput, resourceMap.keySet());
+
+    // Read the best possible assignment from metadata store
+    Map<String, ResourceAssignment> currentBestPossibleAssignment =
+        getBestPossibleAssignment(_assignmentMetadataStore, currentStateOutput,
+            resourceMap.keySet());
+    ClusterModel clusterModel;
+    try {
+      clusterModel = ClusterModelProvider
+          .generateClusterModelForPartialRebalance(clusterData, resourceMap, activeNodes,
+              currentBaseline, currentBestPossibleAssignment);
+    } catch (Exception ex) {
+      throw new HelixRebalanceException("Failed to generate cluster model for partial rebalance.",
+          HelixRebalanceException.Type.INVALID_CLUSTER_STATUS, ex);
+    }
+    Map<String, ResourceAssignment> newAssignment = calculateAssignment(clusterModel, algorithm);
+
+    // Asynchronously report baseline divergence metric before persisting to metadata store,
+    // just in case if persisting fails, we still have the metric.
+    // To avoid changes of the new assignment and make it safe when being used to measure baseline
+    // divergence, use a deep copy of the new assignment.
+    Map<String, ResourceAssignment> newAssignmentCopy = new HashMap<>();
+    for (Map.Entry<String, ResourceAssignment> entry : newAssignment.entrySet()) {
+      newAssignmentCopy.put(entry.getKey(), new ResourceAssignment(entry.getValue().getRecord()));
+    }
+
+    _baselineDivergenceGauge.asyncMeasureAndUpdateValue(clusterData.getAsyncTasksThreadPool(),
+        currentBaseline, newAssignmentCopy);
+
+    if (_assignmentMetadataStore != null) {
+      try {
+        _writeLatency.startMeasuringLatency();
+        _assignmentMetadataStore.persistBestPossibleAssignment(newAssignment);
+        _writeLatency.endMeasuringLatency();
+      } catch (Exception ex) {
+        throw new HelixRebalanceException("Failed to persist the new best possible assignment.",
+            HelixRebalanceException.Type.INVALID_REBALANCER_STATUS, ex);
+      }
+    } else {
+      LOG.debug("Assignment Metadata Store is null. Skip persisting the baseline assignment.");
+    }
+    _partialRebalanceLatency.endMeasuringLatency();
+    LOG.info("Finish calculating the new best possible assignment.");
+    return newAssignment;
+  }
+
+  /**
+   * @param clusterModel the cluster model that contains all the cluster status for the purpose of
+   *                     rebalancing.
+   * @return the new optimal assignment for the resources.
+   */
+  private Map<String, ResourceAssignment> calculateAssignment(ClusterModel clusterModel,
+      RebalanceAlgorithm algorithm) throws HelixRebalanceException {
+    long startTime = System.currentTimeMillis();
+    LOG.info("Start calculating for an assignment with algorithm {}",
+        algorithm.getClass().getSimpleName());
+    OptimalAssignment optimalAssignment = algorithm.calculate(clusterModel);
+    Map<String, ResourceAssignment> newAssignment =
+        optimalAssignment.getOptimalResourceAssignment();
+    LOG.info("Finish calculating an assignment with algorithm {}. Took: {} ms.",
+        algorithm.getClass().getSimpleName(), System.currentTimeMillis() - startTime);
+    return newAssignment;
+  }
+
+  // Generate the preference lists from the state mapping based on state priority.
+  private Map<String, List<String>> getPreferenceLists(ResourceAssignment newAssignment,
+      Map<String, Integer> statePriorityMap) {
+    Map<String, List<String>> preferenceList = new HashMap<>();
+    for (Partition partition : newAssignment.getMappedPartitions()) {
+      List<String> nodes = new ArrayList<>(newAssignment.getReplicaMap(partition).keySet());
+      // To ensure backward compatibility, sort the preference list according to state priority.
+      nodes.sort((node1, node2) -> {
+        int statePriority1 =
+            statePriorityMap.get(newAssignment.getReplicaMap(partition).get(node1));
+        int statePriority2 =
+            statePriorityMap.get(newAssignment.getReplicaMap(partition).get(node2));
+        if (statePriority1 == statePriority2) {
+          return node1.compareTo(node2);
+        } else {
+          return statePriority1 - statePriority2;
+        }
+      });
+      preferenceList.put(partition.getPartitionName(), nodes);
+    }
+    return preferenceList;
+  }
+
+  private void validateInput(ResourceControllerDataProvider clusterData,
+      Map<String, Resource> resourceMap) throws HelixRebalanceException {
+    Set<String> nonCompatibleResources = resourceMap.entrySet().stream().filter(resourceEntry -> {
+      IdealState is = clusterData.getIdealState(resourceEntry.getKey());
+      return is == null || !is.getRebalanceMode().equals(IdealState.RebalanceMode.FULL_AUTO)
+          || !WagedRebalancer.class.getName().equals(is.getRebalancerClassName());
+    }).map(Map.Entry::getKey).collect(Collectors.toSet());
+    if (!nonCompatibleResources.isEmpty()) {
+      throw new HelixRebalanceException(String.format(
+          "Input contains invalid resource(s) that cannot be rebalanced by the WAGED rebalancer. %s",
+          nonCompatibleResources.toString()), HelixRebalanceException.Type.INVALID_INPUT);
+    }
+  }
+
+  /**
+   * @param assignmentMetadataStore
+   * @param currentStateOutput
+   * @param resources
+   * @return The current baseline assignment. If record does not exist in the
+   *         assignmentMetadataStore, return the current state assignment.
+   * @throws HelixRebalanceException
+   */
+  private Map<String, ResourceAssignment> getBaselineAssignment(
+      AssignmentMetadataStore assignmentMetadataStore, CurrentStateOutput currentStateOutput,
+      Set<String> resources) throws HelixRebalanceException {
+    Map<String, ResourceAssignment> currentBaseline = Collections.emptyMap();
+    if (assignmentMetadataStore != null) {
+      try {
+        _stateReadLatency.startMeasuringLatency();
+        currentBaseline = assignmentMetadataStore.getBaseline();
+        _stateReadLatency.endMeasuringLatency();
+      } catch (Exception ex) {
+        throw new HelixRebalanceException(
+            "Failed to get the current baseline assignment because of unexpected error.",
+            HelixRebalanceException.Type.INVALID_REBALANCER_STATUS, ex);
+      }
+    }
+    if (currentBaseline.isEmpty()) {
+      LOG.warn("The current baseline assignment record is empty. Use the current states instead.");
+      currentBaseline = currentStateOutput.getAssignment(resources);
+    }
+    currentBaseline.keySet().retainAll(resources);
+    return currentBaseline;
+  }
+
+  /**
+   * @param assignmentMetadataStore
+   * @param currentStateOutput
+   * @param resources
+   * @return The current best possible assignment. If record does not exist in the
+   *         assignmentMetadataStore, return the current state assignment.
+   * @throws HelixRebalanceException
+   */
+  protected Map<String, ResourceAssignment> getBestPossibleAssignment(
+      AssignmentMetadataStore assignmentMetadataStore, CurrentStateOutput currentStateOutput,
+      Set<String> resources) throws HelixRebalanceException {
+    Map<String, ResourceAssignment> currentBestAssignment = Collections.emptyMap();
+    if (assignmentMetadataStore != null) {
+      try {
+        _stateReadLatency.startMeasuringLatency();
+        currentBestAssignment = assignmentMetadataStore.getBestPossibleAssignment();
+        _stateReadLatency.endMeasuringLatency();
+      } catch (Exception ex) {
+        throw new HelixRebalanceException(
+            "Failed to get the current best possible assignment because of unexpected error.",
+            HelixRebalanceException.Type.INVALID_REBALANCER_STATUS, ex);
+      }
+    }
+    if (currentBestAssignment.isEmpty()) {
+      LOG.warn(
+          "The current best possible assignment record is empty. Use the current states instead.");
+      currentBestAssignment = currentStateOutput.getAssignment(resources);
+    }
+    currentBestAssignment.keySet().retainAll(resources);
+    return currentBestAssignment;
+  }
+
+  /**
+   * Schedule rebalance according to the delayed rebalance logic.
+   * @param clusterData the current cluster data cache
+   * @param delayedActiveNodes the active nodes set that is calculated with the delay time window
+   * @param resourceSet the rebalanced resourceSet
+   */
+  private void delayedRebalanceSchedule(ResourceControllerDataProvider clusterData,
+      Set<String> delayedActiveNodes, Set<String> resourceSet) {
+    if (_manager != null) {
+      // Schedule for the next delayed rebalance in case no cluster change event happens.
+      ClusterConfig clusterConfig = clusterData.getClusterConfig();
+      boolean delayedRebalanceEnabled = DelayedRebalanceUtil.isDelayRebalanceEnabled(clusterConfig);
+      Set<String> offlineOrDisabledInstances = new HashSet<>(delayedActiveNodes);
+      offlineOrDisabledInstances.removeAll(clusterData.getEnabledLiveInstances());
+      for (String resource : resourceSet) {
+        DelayedRebalanceUtil
+            .setRebalanceScheduler(resource, delayedRebalanceEnabled, offlineOrDisabledInstances,
+                clusterData.getInstanceOfflineTimeMap(), clusterData.getLiveInstances().keySet(),
+                clusterData.getInstanceConfigMap(), clusterConfig.getRebalanceDelayTime(),
+                clusterConfig, _manager);
+      }
+    } else {
+      LOG.warn("Skip scheduling a delayed rebalancer since HelixManager is not specified.");
+    }
+  }
+
+  /**
+   * Update the rebalanced ideal states according to the real active nodes.
+   * Since the rebalancing might be done with the delayed logic, the rebalanced ideal states
+   * might include inactive nodes.
+   * This overwrite will adjust the final mapping, so as to ensure the result is completely valid.
+   * @param idealStateMap the calculated ideal states.
+   * @param clusterData the cluster data cache.
+   * @param resourceMap the rebalanaced resource map.
+   * @param baseline the baseline assignment.
+   * @param algorithm the rebalance algorithm.
+   */
+  private void applyRebalanceOverwrite(Map<String, IdealState> idealStateMap,
+      ResourceControllerDataProvider clusterData, Map<String, Resource> resourceMap,
+      Map<String, ResourceAssignment> baseline, RebalanceAlgorithm algorithm)
+      throws HelixRebalanceException {
+    ClusterModel clusterModel;
+    try {
+      // Note this calculation uses the baseline as the best possible assignment input here.
+      // This is for minimizing unnecessary partition movement.
+      clusterModel = ClusterModelProvider
+          .generateClusterModelFromExistingAssignment(clusterData, resourceMap, baseline);
+    } catch (Exception ex) {
+      throw new HelixRebalanceException(
+          "Failed to generate cluster model for delayed rebalance overwrite.",
+          HelixRebalanceException.Type.INVALID_CLUSTER_STATUS, ex);
+    }
+    Map<String, IdealState> activeIdealStates =
+        convertResourceAssignment(clusterData, calculateAssignment(clusterModel, algorithm));
+    for (String resourceName : idealStateMap.keySet()) {
+      // The new calculated ideal state before overwrite
+      IdealState newIdealState = idealStateMap.get(resourceName);
+      if (!activeIdealStates.containsKey(resourceName)) {
+        throw new HelixRebalanceException(
+            "Failed to calculate the complete partition assignment with all active nodes. Cannot find the resource assignment for "
+                + resourceName, HelixRebalanceException.Type.FAILED_TO_CALCULATE);
+      }
+      // The ideal state that is calculated based on the real alive/enabled instances list
+      IdealState newActiveIdealState = activeIdealStates.get(resourceName);
+      // The current ideal state that exists in the IdealState znode
+      IdealState currentIdealState = clusterData.getIdealState(resourceName);
+      Set<String> enabledLiveInstances = clusterData.getEnabledLiveInstances();
+      int numReplica = currentIdealState.getReplicaCount(enabledLiveInstances.size());
+      int minActiveReplica =
+          DelayedRebalanceUtil.getMinActiveReplica(currentIdealState, numReplica);
+      Map<String, List<String>> finalPreferenceLists = DelayedRebalanceUtil
+          .getFinalDelayedMapping(newActiveIdealState.getPreferenceLists(),
+              newIdealState.getPreferenceLists(), enabledLiveInstances,
+              Math.min(minActiveReplica, numReplica));
+
+      newIdealState.setPreferenceLists(finalPreferenceLists);
+    }
+  }
+
+  private void applyUserDefinedPreferenceList(ResourceConfig resourceConfig,
+      IdealState idealState) {
+    if (resourceConfig != null) {
+      Map<String, List<String>> userDefinedPreferenceList = resourceConfig.getPreferenceLists();
+      if (!userDefinedPreferenceList.isEmpty()) {
+        LOG.info("Using user defined preference list for partitions.");
+        for (String partition : userDefinedPreferenceList.keySet()) {
+          idealState.setPreferenceList(partition, userDefinedPreferenceList.get(partition));
+        }
+      }
+    }
+  }
+
+  protected AssignmentMetadataStore getAssignmentMetadataStore() {
+    return _assignmentMetadataStore;
+  }
+
+  protected MetricCollector getMetricCollector() {
+    return _metricCollector;
+  }
+
+  @Override
+  protected void finalize()
+      throws Throwable {
+    super.finalize();
+    close();
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ConstraintBasedAlgorithm.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ConstraintBasedAlgorithm.java
new file mode 100644
index 0000000..dcadff6
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ConstraintBasedAlgorithm.java
@@ -0,0 +1,228 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.stream.Collectors;
+
+import com.google.common.collect.Maps;
+import org.apache.helix.HelixRebalanceException;
+import org.apache.helix.controller.rebalancer.waged.RebalanceAlgorithm;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterModel;
+import org.apache.helix.controller.rebalancer.waged.model.OptimalAssignment;
+import org.apache.helix.model.ResourceAssignment;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * The algorithm is based on a given set of constraints
+ * - HardConstraint: Approve or deny the assignment given its condition, any assignment cannot
+ * bypass any "hard constraint"
+ * - SoftConstraint: Evaluate the assignment by points/rewards/scores, a higher point means a better
+ * assignment
+ * The goal is to accumulate the most points(rewards) from "soft constraints" while avoiding any
+ * "hard constraints"
+ */
+class ConstraintBasedAlgorithm implements RebalanceAlgorithm {
+  private static final Logger LOG = LoggerFactory.getLogger(ConstraintBasedAlgorithm.class);
+  private final List<HardConstraint> _hardConstraints;
+  private final Map<SoftConstraint, Float> _softConstraints;
+
+  ConstraintBasedAlgorithm(List<HardConstraint> hardConstraints,
+      Map<SoftConstraint, Float> softConstraints) {
+    _hardConstraints = hardConstraints;
+    _softConstraints = softConstraints;
+  }
+
+  @Override
+  public OptimalAssignment calculate(ClusterModel clusterModel)
+      throws HelixRebalanceException {
+    OptimalAssignment optimalAssignment = new OptimalAssignment();
+    List<AssignableNode> nodes = new ArrayList<>(clusterModel.getAssignableNodes().values());
+    Set<String> busyInstances =
+        getBusyInstances(clusterModel.getContext().getBestPossibleAssignment().values());
+    // Sort the replicas so the input is stable for the greedy algorithm.
+    // For the other algorithm implementation, this sorting could be unnecessary.
+    for (AssignableReplica replica : getOrderedAssignableReplica(clusterModel)) {
+      Optional<AssignableNode> maybeBestNode =
+          getNodeWithHighestPoints(replica, nodes, clusterModel.getContext(), busyInstances,
+              optimalAssignment);
+      // stop immediately if any replica cannot find best assignable node
+      if (optimalAssignment.hasAnyFailure()) {
+        String errorMessage = String
+            .format("Unable to find any available candidate node for partition %s; Fail reasons: %s",
+            replica.getPartitionName(), optimalAssignment.getFailures());
+        throw new HelixRebalanceException(errorMessage,
+            HelixRebalanceException.Type.FAILED_TO_CALCULATE);
+      }
+      maybeBestNode.ifPresent(node -> clusterModel
+          .assign(replica.getResourceName(), replica.getPartitionName(), replica.getReplicaState(),
+              node.getInstanceName()));
+    }
+    optimalAssignment.updateAssignments(clusterModel);
+    return optimalAssignment;
+  }
+
+  private Optional<AssignableNode> getNodeWithHighestPoints(AssignableReplica replica,
+      List<AssignableNode> assignableNodes, ClusterContext clusterContext,
+      Set<String> busyInstances, OptimalAssignment optimalAssignment) {
+    Map<AssignableNode, List<HardConstraint>> hardConstraintFailures = new ConcurrentHashMap<>();
+    List<AssignableNode> candidateNodes = assignableNodes.parallelStream().filter(candidateNode -> {
+      boolean isValid = true;
+      // need to record all the failure reasons and it gives us the ability to debug/fix the runtime
+      // cluster environment
+      for (HardConstraint hardConstraint : _hardConstraints) {
+        if (!hardConstraint.isAssignmentValid(candidateNode, replica, clusterContext)) {
+          hardConstraintFailures.computeIfAbsent(candidateNode, node -> new ArrayList<>())
+              .add(hardConstraint);
+          isValid = false;
+        }
+      }
+      return isValid;
+    }).collect(Collectors.toList());
+
+    if (candidateNodes.isEmpty()) {
+      optimalAssignment.recordAssignmentFailure(replica,
+          Maps.transformValues(hardConstraintFailures, this::convertFailureReasons));
+      return Optional.empty();
+    }
+
+    return candidateNodes.parallelStream().map(node -> new HashMap.SimpleEntry<>(node,
+        getAssignmentNormalizedScore(node, replica, clusterContext)))
+        .max((nodeEntry1, nodeEntry2) -> {
+          int scoreCompareResult = nodeEntry1.getValue().compareTo(nodeEntry2.getValue());
+          if (scoreCompareResult == 0) {
+            // If the evaluation scores of 2 nodes are the same, the algorithm assigns the replica
+            // to the idle node first.
+            int idleScore1 = busyInstances.contains(nodeEntry1.getKey().getInstanceName()) ? 0 : 1;
+            int idleScore2 = busyInstances.contains(nodeEntry2.getKey().getInstanceName()) ? 0 : 1;
+            return idleScore1 - idleScore2;
+          } else {
+            return scoreCompareResult;
+          }
+        }).map(Map.Entry::getKey);
+  }
+
+  private double getAssignmentNormalizedScore(AssignableNode node, AssignableReplica replica,
+      ClusterContext clusterContext) {
+    double sum = 0;
+    for (Map.Entry<SoftConstraint, Float> softConstraintEntry : _softConstraints.entrySet()) {
+      SoftConstraint softConstraint = softConstraintEntry.getKey();
+      float weight = softConstraintEntry.getValue();
+      if (weight != 0) {
+        // Skip calculating zero weighted constraints.
+        sum += weight * softConstraint.getAssignmentNormalizedScore(node, replica, clusterContext);
+      }
+    }
+    return sum;
+  }
+
+  private List<String> convertFailureReasons(List<HardConstraint> hardConstraints) {
+    return hardConstraints.stream().map(HardConstraint::getDescription)
+        .collect(Collectors.toList());
+  }
+
+  private List<AssignableReplica> getOrderedAssignableReplica(ClusterModel clusterModel) {
+    Map<String, Set<AssignableReplica>> replicasByResource = clusterModel.getAssignableReplicaMap();
+    List<AssignableReplica> orderedAssignableReplicas =
+        replicasByResource.values().stream().flatMap(replicas -> replicas.stream())
+            .collect(Collectors.toList());
+
+    Map<String, ResourceAssignment> bestPossibleAssignment =
+        clusterModel.getContext().getBestPossibleAssignment();
+    Map<String, ResourceAssignment> baselineAssignment =
+        clusterModel.getContext().getBaselineAssignment();
+
+    Map<String, Integer> replicaHashCodeMap = orderedAssignableReplicas.parallelStream().collect(
+        Collectors.toMap(AssignableReplica::toString,
+            replica -> Objects.hash(replica.toString(), clusterModel.getAssignableNodes().keySet()),
+            (hash1, hash2) -> hash2));
+
+    // 1. Sort according if the assignment exists in the best possible and/or baseline assignment
+    // 2. Sort according to the state priority. Note that prioritizing the top state is required.
+    // Or the greedy algorithm will unnecessarily shuffle the states between replicas.
+    // 3. Sort according to the resource/partition name.
+    orderedAssignableReplicas.sort((replica1, replica2) -> {
+      String resourceName1 = replica1.getResourceName();
+      String resourceName2 = replica2.getResourceName();
+      if (bestPossibleAssignment.containsKey(resourceName1) == bestPossibleAssignment
+          .containsKey(resourceName2)) {
+        if (baselineAssignment.containsKey(resourceName1) == baselineAssignment
+            .containsKey(resourceName2)) {
+          // If both assignment states have/not have the resource assignment the same,
+          // compare for additional dimensions.
+          int statePriority1 = replica1.getStatePriority();
+          int statePriority2 = replica2.getStatePriority();
+          if (statePriority1 == statePriority2) {
+            // If state priorities are the same, try to randomize the replicas order. Otherwise,
+            // the same replicas might always be moved in each rebalancing. This is because their
+            // placement calculating will always happen at the critical moment while the cluster is
+            // almost close to the expected utilization.
+            //
+            // Note that to ensure the algorithm is deterministic with the same inputs, do not use
+            // Random functions here. Use hashcode based on the cluster topology information to get
+            // a controlled randomized order is good enough.
+            Integer replicaHash1 = replicaHashCodeMap.get(replica1.toString());
+            Integer replicaHash2 = replicaHashCodeMap.get(replica2.toString());
+            if (!replicaHash1.equals(replicaHash2)) {
+              return replicaHash1.compareTo(replicaHash2);
+            } else {
+              // In case of hash collision, return order according to the name.
+              return replica1.toString().compareTo(replica2.toString());
+            }
+          } else {
+            // Note we shall prioritize the replica with a higher state priority,
+            // the smaller priority number means higher priority.
+            return statePriority1 - statePriority2;
+          }
+        } else {
+          // If the baseline assignment contains the assignment, prioritize the replica.
+          return baselineAssignment.containsKey(resourceName1) ? -1 : 1;
+        }
+      } else {
+        // If the best possible assignment contains the assignment, prioritize the replica.
+        return bestPossibleAssignment.containsKey(resourceName1) ? -1 : 1;
+      }
+    });
+    return orderedAssignableReplicas;
+  }
+
+  /**
+   * @param assignments A collection of resource replicas assignment.
+   * @return A set of instance names that have at least one replica assigned in the input assignments.
+   */
+  private Set<String> getBusyInstances(Collection<ResourceAssignment> assignments) {
+    return assignments.stream().flatMap(
+        resourceAssignment -> resourceAssignment.getRecord().getMapFields().values().stream()
+            .flatMap(instanceStateMap -> instanceStateMap.keySet().stream())
+            .collect(Collectors.toSet()).stream()).collect(Collectors.toSet());
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ConstraintBasedAlgorithmFactory.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ConstraintBasedAlgorithmFactory.java
new file mode 100644
index 0000000..934bfa7
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ConstraintBasedAlgorithmFactory.java
@@ -0,0 +1,82 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Maps;
+import org.apache.helix.HelixManagerProperties;
+import org.apache.helix.SystemPropertyKeys;
+import org.apache.helix.controller.rebalancer.waged.RebalanceAlgorithm;
+import org.apache.helix.model.ClusterConfig;
+
+/**
+ * The factory class to create an instance of {@link ConstraintBasedAlgorithm}
+ */
+public class ConstraintBasedAlgorithmFactory {
+  private static final Map<String, Float> MODEL = new HashMap<String, Float>() {
+    {
+      // The default setting
+      put(PartitionMovementConstraint.class.getSimpleName(), 2f);
+      put(InstancePartitionsCountConstraint.class.getSimpleName(), 1f);
+      put(ResourcePartitionAntiAffinityConstraint.class.getSimpleName(), 1f);
+      put(ResourceTopStateAntiAffinityConstraint.class.getSimpleName(), 3f);
+      put(MaxCapacityUsageInstanceConstraint.class.getSimpleName(), 5f);
+    }
+  };
+
+  static {
+    Properties properties =
+        new HelixManagerProperties(SystemPropertyKeys.SOFT_CONSTRAINT_WEIGHTS).getProperties();
+    // overwrite the default value with data load from property file
+    properties.forEach((constraintName, weight) -> MODEL.put(String.valueOf(constraintName),
+        Float.valueOf(String.valueOf(weight))));
+  }
+
+  public static RebalanceAlgorithm getInstance(
+      Map<ClusterConfig.GlobalRebalancePreferenceKey, Integer> preferences) {
+    List<HardConstraint> hardConstraints =
+        ImmutableList.of(new FaultZoneAwareConstraint(), new NodeCapacityConstraint(),
+            new ReplicaActivateConstraint(), new NodeMaxPartitionLimitConstraint(),
+            new ValidGroupTagConstraint(), new SamePartitionOnInstanceConstraint());
+
+    int evennessPreference =
+        preferences.getOrDefault(ClusterConfig.GlobalRebalancePreferenceKey.EVENNESS, 1);
+    int movementPreference =
+        preferences.getOrDefault(ClusterConfig.GlobalRebalancePreferenceKey.LESS_MOVEMENT, 1);
+
+    List<SoftConstraint> softConstraints = ImmutableList
+        .of(new PartitionMovementConstraint(), new InstancePartitionsCountConstraint(),
+            new ResourcePartitionAntiAffinityConstraint(),
+            new ResourceTopStateAntiAffinityConstraint(), new MaxCapacityUsageInstanceConstraint());
+    Map<SoftConstraint, Float> softConstraintsWithWeight = Maps.toMap(softConstraints, key -> {
+      String name = key.getClass().getSimpleName();
+      float weight = MODEL.get(name);
+      return name.equals(PartitionMovementConstraint.class.getSimpleName()) ?
+          movementPreference * weight : evennessPreference * weight;
+    });
+
+    return new ConstraintBasedAlgorithm(hardConstraints, softConstraintsWithWeight);
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/FaultZoneAwareConstraint.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/FaultZoneAwareConstraint.java
new file mode 100644
index 0000000..c33419e
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/FaultZoneAwareConstraint.java
@@ -0,0 +1,43 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+
+class FaultZoneAwareConstraint extends HardConstraint {
+
+  @Override
+  boolean isAssignmentValid(AssignableNode node, AssignableReplica replica,
+      ClusterContext clusterContext) {
+    if (!node.hasFaultZone()) {
+      return true;
+    }
+    return !clusterContext
+        .getPartitionsForResourceAndFaultZone(replica.getResourceName(), node.getFaultZone())
+        .contains(replica.getPartitionName());
+  }
+
+  @Override
+  String getDescription() {
+    return "A fault zone cannot contain more than 1 replica of same partition";
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/HardConstraint.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/HardConstraint.java
new file mode 100644
index 0000000..f544d4b
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/HardConstraint.java
@@ -0,0 +1,47 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+
+/**
+ * Evaluate a partition allocation proposal and return YES or NO based on the cluster context.
+ * Any proposal fails one or more hard constraints will be rejected.
+ */
+abstract class HardConstraint {
+
+  /**
+   * Check if the replica could be assigned to the node
+   * @return True if the proposed assignment is valid; False otherwise
+   */
+  abstract boolean isAssignmentValid(AssignableNode node, AssignableReplica replica,
+      ClusterContext clusterContext);
+
+  /**
+   * Return class name by default as description if it's explanatory enough, child class could override
+   * the method and add more detailed descriptions
+   * @return The detailed description of hard constraint
+   */
+  String getDescription() {
+    return getClass().getName();
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/InstancePartitionsCountConstraint.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/InstancePartitionsCountConstraint.java
new file mode 100644
index 0000000..948a7d0
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/InstancePartitionsCountConstraint.java
@@ -0,0 +1,41 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+
+/**
+ * Evaluate by instance's current partition count versus estimated max partition count
+ * Intuitively, Encourage the assignment if the instance's occupancy rate is below average;
+ * Discourage the assignment if the instance's occupancy rate is above average
+ * The normalized score will be within [0, 1]
+ */
+class InstancePartitionsCountConstraint extends UsageSoftConstraint {
+
+  @Override
+  protected double getAssignmentScore(AssignableNode node, AssignableReplica replica,
+      ClusterContext clusterContext) {
+    int estimatedMaxPartitionCount = clusterContext.getEstimatedMaxPartitionCount();
+    int currentPartitionCount = node.getAssignedReplicaCount();
+    return computeUtilizationScore(estimatedMaxPartitionCount, currentPartitionCount);
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/MaxCapacityUsageInstanceConstraint.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/MaxCapacityUsageInstanceConstraint.java
new file mode 100644
index 0000000..8f41f5e
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/MaxCapacityUsageInstanceConstraint.java
@@ -0,0 +1,42 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+
+/**
+ * The constraint evaluates the score by checking the max used capacity key out of all the capacity
+ * keys.
+ * The higher the maximum usage value for the capacity key, the lower the score will be, implying
+ * that it is that much less desirable to assign anything on the given node.
+ * It is a greedy approach since it evaluates only on the most used capacity key.
+ */
+class MaxCapacityUsageInstanceConstraint extends UsageSoftConstraint {
+
+  @Override
+  protected double getAssignmentScore(AssignableNode node, AssignableReplica replica,
+      ClusterContext clusterContext) {
+    float estimatedMaxUtilization = clusterContext.getEstimatedMaxUtilization();
+    float projectedHighestUtilization = node.getProjectedHighestUtilization(replica.getCapacity());
+    return computeUtilizationScore(estimatedMaxUtilization, projectedHighestUtilization);
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/NodeCapacityConstraint.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/NodeCapacityConstraint.java
new file mode 100644
index 0000000..827d6ce
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/NodeCapacityConstraint.java
@@ -0,0 +1,50 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.Map;
+
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+
+class NodeCapacityConstraint extends HardConstraint {
+
+  @Override
+  boolean isAssignmentValid(AssignableNode node, AssignableReplica replica,
+      ClusterContext clusterContext) {
+    Map<String, Integer> nodeCapacity = node.getRemainingCapacity();
+    Map<String, Integer> replicaCapacity = replica.getCapacity();
+
+    for (String key : replicaCapacity.keySet()) {
+      if (nodeCapacity.containsKey(key)) {
+        if (nodeCapacity.get(key) < replicaCapacity.get(key)) {
+          return false;
+        }
+      }
+    }
+    return true;
+  }
+
+  @Override
+  String getDescription() {
+    return "Node has insufficient capacity";
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/NodeMaxPartitionLimitConstraint.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/NodeMaxPartitionLimitConstraint.java
new file mode 100644
index 0000000..cda5329
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/NodeMaxPartitionLimitConstraint.java
@@ -0,0 +1,43 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+
+class NodeMaxPartitionLimitConstraint extends HardConstraint {
+
+  @Override
+  boolean isAssignmentValid(AssignableNode node, AssignableReplica replica,
+      ClusterContext clusterContext) {
+    boolean exceedMaxPartitionLimit =
+        node.getMaxPartition() < 0 || node.getAssignedReplicaCount() < node.getMaxPartition();
+    boolean exceedResourceMaxPartitionLimit = replica.getResourceMaxPartitionsPerInstance() < 0
+        || node.getAssignedPartitionsByResource(replica.getResourceName()).size() < replica
+        .getResourceMaxPartitionsPerInstance();
+    return exceedMaxPartitionLimit && exceedResourceMaxPartitionLimit;
+  }
+
+  @Override
+  String getDescription() {
+    return "Cannot exceed the maximum number of partitions limitation on node";
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/PartitionMovementConstraint.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/PartitionMovementConstraint.java
new file mode 100644
index 0000000..dc19c19
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/PartitionMovementConstraint.java
@@ -0,0 +1,96 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.Collections;
+import java.util.Map;
+
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+import org.apache.helix.model.Partition;
+import org.apache.helix.model.ResourceAssignment;
+
+/**
+ * Evaluate the proposed assignment according to the potential partition movements cost.
+ * The cost is evaluated based on the difference between the old assignment and the new assignment.
+ * In detail, we consider the following two previous assignments as the base.
+ * - Baseline assignment that is calculated regardless of the node state (online/offline).
+ * - Previous Best Possible assignment.
+ * Any change to these two assignments will increase the partition movements cost, so that the
+ * evaluated score will become lower.
+ */
+class PartitionMovementConstraint extends SoftConstraint {
+  private static final double MAX_SCORE = 1f;
+  private static final double MIN_SCORE = 0f;
+  //TODO: these factors will be tuned based on user's preference
+  // This factor indicates the default score that is evaluated if only partition allocation matches
+  // (states are different).
+  private static final double ALLOCATION_MATCH_FACTOR = 0.5;
+
+  PartitionMovementConstraint() {
+    super(MAX_SCORE, MIN_SCORE);
+  }
+
+  @Override
+  protected double getAssignmentScore(AssignableNode node, AssignableReplica replica,
+      ClusterContext clusterContext) {
+    // Prioritize the previous Best Possible assignment
+    Map<String, String> bestPossibleAssignment =
+        getStateMap(replica, clusterContext.getBestPossibleAssignment());
+    if (!bestPossibleAssignment.isEmpty()) {
+      return calculateAssignmentScale(node, replica, bestPossibleAssignment);
+    }
+    // else, compare the baseline only if the best possible assignment does not contain the replica
+    Map<String, String> baselineAssignment =
+        getStateMap(replica, clusterContext.getBaselineAssignment());
+    if (!baselineAssignment.isEmpty()) {
+      return calculateAssignmentScale(node, replica, baselineAssignment);
+    }
+    return 0;
+  }
+
+  private Map<String, String> getStateMap(AssignableReplica replica,
+      Map<String, ResourceAssignment> assignment) {
+    String resourceName = replica.getResourceName();
+    String partitionName = replica.getPartitionName();
+    if (assignment == null || !assignment.containsKey(resourceName)) {
+      return Collections.emptyMap();
+    }
+    return assignment.get(resourceName).getReplicaMap(new Partition(partitionName));
+  }
+
+  private double calculateAssignmentScale(AssignableNode node, AssignableReplica replica,
+      Map<String, String> instanceToStateMap) {
+    String instanceName = node.getInstanceName();
+    if (!instanceToStateMap.containsKey(instanceName)) {
+      return 0;
+    } else {
+      return (instanceToStateMap.get(instanceName).equals(replica.getReplicaState()) ? 1 :
+          ALLOCATION_MATCH_FACTOR);
+    }
+  }
+
+  @Override
+  protected NormalizeFunction getNormalizeFunction() {
+    // PartitionMovementConstraint already scale the score properly.
+    return (score) -> score;
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/InstanceMonitorMBean.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ReplicaActivateConstraint.java
similarity index 50%
copy from helix-core/src/main/java/org/apache/helix/monitoring/mbeans/InstanceMonitorMBean.java
copy to helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ReplicaActivateConstraint.java
index a3221d8..9152efe 100644
--- a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/InstanceMonitorMBean.java
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ReplicaActivateConstraint.java
@@ -1,4 +1,4 @@
-package org.apache.helix.monitoring.mbeans;
+package org.apache.helix.controller.rebalancer.waged.constraints;
 
 /*
  * Licensed to the Apache Software Foundation (ASF) under one
@@ -19,33 +19,23 @@ package org.apache.helix.monitoring.mbeans;
  * under the License.
  */
 
-import org.apache.helix.monitoring.SensorNameProvider;
+import java.util.List;
 
-/**
- * A basic bean describing the status of a single instance
- */
-public interface InstanceMonitorMBean extends SensorNameProvider {
-  /**
-   * Check if this instance is live
-   * @return 1 if running, 0 otherwise
-   */
-  public long getOnline();
-
-  /**
-   * Check if this instance is enabled
-   * @return 1 if enabled, 0 if disabled
-   */
-  public long getEnabled();
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
 
-  /**
-   * Get total message received for this instances
-   * @return The total number of messages sent to this instance
-   */
-  public long getTotalMessageReceived();
+class ReplicaActivateConstraint extends HardConstraint {
+  @Override
+  boolean isAssignmentValid(AssignableNode node, AssignableReplica replica,
+      ClusterContext clusterContext) {
+    List<String> disabledPartitions =
+        node.getDisabledPartitionsMap().get(replica.getResourceName());
+    return disabledPartitions == null || !disabledPartitions.contains(replica.getPartitionName());
+  }
 
-  /**
-   * Get the total disabled partitions number for this instance
-   * @return The total number of disabled partitions
-   */
-  public long getDisabledPartitions();
+  @Override
+  String getDescription() {
+    return "Cannot assign the inactive replica";
+  }
 }
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ResourcePartitionAntiAffinityConstraint.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ResourcePartitionAntiAffinityConstraint.java
new file mode 100644
index 0000000..a3b701f
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ResourcePartitionAntiAffinityConstraint.java
@@ -0,0 +1,43 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+
+/**
+ * This constraint exists to make partitions belonging to the same resource be assigned as far from
+ * each other as possible. This is because it is undesirable to have many partitions belonging to
+ * the same resource be assigned to the same node to minimize the impact of node failure scenarios.
+ * The score is higher the fewer the partitions are on the node belonging to the same resource.
+ */
+class ResourcePartitionAntiAffinityConstraint extends UsageSoftConstraint {
+  @Override
+  protected double getAssignmentScore(AssignableNode node, AssignableReplica replica,
+      ClusterContext clusterContext) {
+    String resource = replica.getResourceName();
+    int curPartitionCountForResource = node.getAssignedPartitionsByResource(resource).size();
+    int estimatedMaxPartitionCountForResource =
+        clusterContext.getEstimatedMaxPartitionByResource(resource);
+    return computeUtilizationScore(estimatedMaxPartitionCountForResource,
+        curPartitionCountForResource);
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ResourceTopStateAntiAffinityConstraint.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ResourceTopStateAntiAffinityConstraint.java
new file mode 100644
index 0000000..f0f9e13
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ResourceTopStateAntiAffinityConstraint.java
@@ -0,0 +1,44 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+
+/**
+ * Evaluate the proposed assignment according to the top state replication count on the instance.
+ * The higher number the number of top state partitions assigned to the instance, the lower the
+ * score, vice versa.
+ */
+class ResourceTopStateAntiAffinityConstraint extends UsageSoftConstraint {
+  @Override
+  protected double getAssignmentScore(AssignableNode node, AssignableReplica replica,
+      ClusterContext clusterContext) {
+    if (!replica.isReplicaTopState()) {
+      // For non top state replica, this constraint is not applicable.
+      // So return zero on any assignable node candidate.
+      return 0;
+    }
+    int curTopPartitionCountForResource = node.getAssignedTopStatePartitionsCount();
+    int estimatedMaxTopStateCount = clusterContext.getEstimatedMaxTopStateCount();
+    return computeUtilizationScore(estimatedMaxTopStateCount, curTopPartitionCountForResource);
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/MonitorDomainNames.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/SamePartitionOnInstanceConstraint.java
similarity index 52%
copy from helix-core/src/main/java/org/apache/helix/monitoring/mbeans/MonitorDomainNames.java
copy to helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/SamePartitionOnInstanceConstraint.java
index 73bf057..202e49a 100644
--- a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/MonitorDomainNames.java
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/SamePartitionOnInstanceConstraint.java
@@ -1,4 +1,4 @@
-package org.apache.helix.monitoring.mbeans;
+package org.apache.helix.controller.rebalancer.waged.constraints;
 
 /*
  * Licensed to the Apache Software Foundation (ASF) under one
@@ -19,14 +19,21 @@ package org.apache.helix.monitoring.mbeans;
  * under the License.
  */
 
-/**
- * This enum defines all of domain names used with various Helix monitor mbeans.
- */
-public enum MonitorDomainNames {
-  ClusterStatus,
-  HelixZkClient,
-  HelixThreadPoolExecutor,
-  HelixCallback,
-  RoutingTableProvider,
-  CLMParticipantReport
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+
+class SamePartitionOnInstanceConstraint extends HardConstraint {
+
+  @Override
+  boolean isAssignmentValid(AssignableNode node, AssignableReplica replica,
+      ClusterContext clusterContext) {
+    return !node.getAssignedPartitionsByResource(replica.getResourceName())
+        .contains(replica.getPartitionName());
+  }
+
+  @Override
+  String getDescription() {
+    return "Same partition of different states cannot co-exist in one instance";
+  }
 }
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/SoftConstraint.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/SoftConstraint.java
new file mode 100644
index 0000000..21bed84
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/SoftConstraint.java
@@ -0,0 +1,90 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+
+/**
+ * The "soft" constraint evaluates the optimality of an assignment by giving it a score of a scale
+ * of [minScore, maxScore]
+ * The higher the score, the better the assignment; Intuitively, the assignment is encouraged.
+ * The lower score the score, the worse the assignment; Intuitively, the assignment is penalized.
+ */
+abstract class SoftConstraint {
+  private final double _maxScore;
+  private final double _minScore;
+
+  interface NormalizeFunction {
+    /**
+     * Scale the origin score to a normalized range (0, 1).
+     * The purpose is to compare scores between different soft constraints.
+     * @param originScore The origin score
+     * @return The normalized value between (0, 1)
+     */
+    double scale(double originScore);
+  }
+
+  /**
+   * Child class customize the min/max score on its own
+   * @param maxScore The max score
+   * @param minScore The min score
+   */
+  SoftConstraint(double maxScore, double minScore) {
+    _maxScore = maxScore;
+    _minScore = minScore;
+  }
+
+  protected double getMaxScore() {
+    return _maxScore;
+  }
+
+  protected double getMinScore() {
+    return _minScore;
+  }
+
+  /**
+   * Evaluate and give a score for an potential assignment partition -> instance
+   * Child class only needs to care about how the score is implemented
+   * @return The score of the assignment in double value
+   */
+  protected abstract double getAssignmentScore(AssignableNode node, AssignableReplica replica,
+      ClusterContext clusterContext);
+
+  /**
+   * Evaluate and give a score for an potential assignment partition -> instance
+   * It's the only exposed method to the caller
+   * @return The score is normalized to be within MinScore and MaxScore
+   */
+  double getAssignmentNormalizedScore(AssignableNode node, AssignableReplica replica,
+      ClusterContext clusterContext) {
+    return getNormalizeFunction().scale(getAssignmentScore(node, replica, clusterContext));
+  }
+
+  /**
+   * The default scaler function that squashes any score within (min_score, max_score) to (0, 1);
+   * Child class could override the method and customize the method on its own
+   * @return The MinMaxScaler instance by default
+   */
+  protected NormalizeFunction getNormalizeFunction() {
+    return (score) -> (score - getMinScore()) / (getMaxScore() - getMinScore());
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/UsageSoftConstraint.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/UsageSoftConstraint.java
new file mode 100644
index 0000000..c8bc521
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/UsageSoftConstraint.java
@@ -0,0 +1,85 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.commons.math3.analysis.function.Sigmoid;
+
+/**
+ * The soft constraint that evaluates the assignment proposal based on usage.
+ */
+abstract class UsageSoftConstraint extends SoftConstraint {
+  private static final double MAX_SCORE = 1f;
+  private static final double MIN_SCORE = 0f;
+  /**
+   * Alpha is used to adjust the curve of sigmoid function.
+   * Intuitively, this is for tolerating the inaccuracy of the estimation.
+   * Ideally, if we have the prefect estimation, we can use a segmented function here, which
+   * scores the assignment with 1.0 if projected usage is below the estimation, and scores 0.0
+   * if the projected usage exceeds the estimation. However, in reality, it is hard to get a
+   * prefect estimation. With the curve of sigmoid, the algorithm reacts differently and
+   * reasonally even the usage is a little bit more or less than the estimation for a certain
+   * extend.
+   * As tested, when we have the input number which surrounds 1, the default alpha value will
+   * ensure a curve that has sigmoid(0.95) = 0.90, sigmoid(1.05) = 0.1. Meaning the constraint
+   * can handle the estimation inaccuracy of +-5%.
+   * To adjust the curve:
+   * 1. Smaller alpha will increase the curve's scope. So the function will be handler a wilder
+   * range of inaccuracy. However, the downside is more random movements since the evenness
+   * score would be more changable and nondefinitive.
+   * 2. Larger alpha will decrease the curve's scope. In that case, we might want to change to
+   * use segmented function so as to speed up the algorthm.
+   **/
+  private static final int DEFAULT_ALPHA = 44;
+  private static final Sigmoid SIGMOID = new Sigmoid();
+
+  UsageSoftConstraint() {
+    super(MAX_SCORE, MIN_SCORE);
+  }
+
+  /**
+   * Compute the utilization score based on the estimated and current usage numbers.
+   * The score = currentUsage / estimatedUsage.
+   * In short, a smaller score means better assignment proposal.
+   *
+   * @param estimatedUsage The estimated usage that is between [0.0, 1.0]
+   * @param currentUsage   The current usage that is between [0.0, 1.0]
+   * @return The score between [0.0, 1.0] that evaluates the utilization.
+   */
+  protected double computeUtilizationScore(double estimatedUsage, double currentUsage) {
+    if (estimatedUsage == 0) {
+      return 0;
+    }
+    return currentUsage / estimatedUsage;
+  }
+
+  /**
+   * Compute evaluation score based on the utilization data.
+   * The normalized score is evaluated using a sigmoid function.
+   * When the usage is smaller than 1.0, the constraint returns a value that is very close to the
+   * max score.
+   * When the usage is close or larger than 1.0, the constraint returns a score that is very close
+   * to the min score. Note even in this case, more usage will still be assigned with a
+   * smaller score.
+   */
+  @Override
+  protected NormalizeFunction getNormalizeFunction() {
+    return (score) -> SIGMOID.value(-(score - 1) * DEFAULT_ALPHA) * (MAX_SCORE - MIN_SCORE);
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/MonitorDomainNames.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ValidGroupTagConstraint.java
similarity index 52%
copy from helix-core/src/main/java/org/apache/helix/monitoring/mbeans/MonitorDomainNames.java
copy to helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ValidGroupTagConstraint.java
index 73bf057..e31864f 100644
--- a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/MonitorDomainNames.java
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ValidGroupTagConstraint.java
@@ -1,4 +1,4 @@
-package org.apache.helix.monitoring.mbeans;
+package org.apache.helix.controller.rebalancer.waged.constraints;
 
 /*
  * Licensed to the Apache Software Foundation (ASF) under one
@@ -19,14 +19,23 @@ package org.apache.helix.monitoring.mbeans;
  * under the License.
  */
 
-/**
- * This enum defines all of domain names used with various Helix monitor mbeans.
- */
-public enum MonitorDomainNames {
-  ClusterStatus,
-  HelixZkClient,
-  HelixThreadPoolExecutor,
-  HelixCallback,
-  RoutingTableProvider,
-  CLMParticipantReport
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+
+class ValidGroupTagConstraint extends HardConstraint {
+  @Override
+  boolean isAssignmentValid(AssignableNode node, AssignableReplica replica,
+      ClusterContext clusterContext) {
+    if (!replica.hasResourceInstanceGroupTag()) {
+      return true;
+    }
+
+    return node.getInstanceTags().contains(replica.getResourceInstanceGroupTag());
+  }
+
+  @Override
+  String getDescription() {
+    return "Instance doesn't have the tag of the replica";
+  }
 }
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/AssignableNode.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/AssignableNode.java
new file mode 100644
index 0000000..06d4976
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/AssignableNode.java
@@ -0,0 +1,374 @@
+package org.apache.helix.controller.rebalancer.waged.model;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.ImmutableSet;
+import org.apache.helix.HelixException;
+import org.apache.helix.controller.rebalancer.util.WagedValidationUtil;
+import org.apache.helix.model.ClusterConfig;
+import org.apache.helix.model.InstanceConfig;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+/**
+ * This class represents a possible allocation of the replication.
+ * Note that any usage updates to the AssignableNode are not thread safe.
+ */
+public class AssignableNode implements Comparable<AssignableNode> {
+  private static final Logger LOG = LoggerFactory.getLogger(AssignableNode.class.getName());
+
+  // Immutable Instance Properties
+  private final String _instanceName;
+  private final String _faultZone;
+  // maximum number of the partitions that can be assigned to the instance.
+  private final int _maxPartition;
+  private final ImmutableSet<String> _instanceTags;
+  private final ImmutableMap<String, List<String>> _disabledPartitionsMap;
+  private final ImmutableMap<String, Integer> _maxAllowedCapacity;
+
+  // Mutable (Dynamic) Instance Properties
+  // A map of <resource name, <partition name, replica>> that tracks the replicas assigned to the
+  // node.
+  private Map<String, Map<String, AssignableReplica>> _currentAssignedReplicaMap;
+  // A map of <capacity key, capacity value> that tracks the current available node capacity
+  private Map<String, Integer> _remainingCapacity;
+
+  /**
+   * Update the node with a ClusterDataCache. This resets the current assignment and recalculates
+   * currentCapacity.
+   * NOTE: While this is required to be used in the constructor, this can also be used when the
+   * clusterCache needs to be
+   * refreshed. This is under the assumption that the capacity mappings of InstanceConfig and
+   * ResourceConfig could
+   * subject to change. If the assumption is no longer true, this function should become private.
+   */
+  AssignableNode(ClusterConfig clusterConfig, InstanceConfig instanceConfig, String instanceName) {
+    _instanceName = instanceName;
+    Map<String, Integer> instanceCapacity = fetchInstanceCapacity(clusterConfig, instanceConfig);
+    _faultZone = computeFaultZone(clusterConfig, instanceConfig);
+    _instanceTags = ImmutableSet.copyOf(instanceConfig.getTags());
+    _disabledPartitionsMap = ImmutableMap.copyOf(instanceConfig.getDisabledPartitionsMap());
+    // make a copy of max capacity
+    _maxAllowedCapacity = ImmutableMap.copyOf(instanceCapacity);
+    _remainingCapacity = new HashMap<>(instanceCapacity);
+    _maxPartition = clusterConfig.getMaxPartitionsPerInstance();
+    _currentAssignedReplicaMap = new HashMap<>();
+  }
+
+  /**
+   * This function should only be used to assign a set of new partitions that are not allocated on
+   * this node. It's because the any exception could occur at the middle of batch assignment and the
+   * previous finished assignment cannot be reverted
+   * Using this function avoids the overhead of updating capacity repeatedly.
+   */
+  void assignInitBatch(Collection<AssignableReplica> replicas) {
+    Map<String, Integer> totalPartitionCapacity = new HashMap<>();
+    for (AssignableReplica replica : replicas) {
+      // TODO: the exception could occur in the middle of for loop and the previous added records cannot be reverted
+      addToAssignmentRecord(replica);
+      // increment the capacity requirement according to partition's capacity configuration.
+      for (Map.Entry<String, Integer> capacity : replica.getCapacity().entrySet()) {
+        totalPartitionCapacity.compute(capacity.getKey(),
+            (key, totalValue) -> (totalValue == null) ? capacity.getValue()
+                : totalValue + capacity.getValue());
+      }
+    }
+
+    // Update the global state after all single replications' calculation is done.
+    for (String capacityKey : totalPartitionCapacity.keySet()) {
+      updateRemainingCapacity(capacityKey, totalPartitionCapacity.get(capacityKey));
+    }
+  }
+
+  /**
+   * Assign a replica to the node.
+   * @param assignableReplica - the replica to be assigned
+   */
+  void assign(AssignableReplica assignableReplica) {
+    addToAssignmentRecord(assignableReplica);
+    assignableReplica.getCapacity().entrySet().stream()
+            .forEach(capacity -> updateRemainingCapacity(capacity.getKey(), capacity.getValue()));
+  }
+
+  /**
+   * Release a replica from the node.
+   * If the replication is not on this node, the assignable node is not updated.
+   * @param replica - the replica to be released
+   */
+  void release(AssignableReplica replica)
+      throws IllegalArgumentException {
+    String resourceName = replica.getResourceName();
+    String partitionName = replica.getPartitionName();
+
+    // Check if the release is necessary
+    if (!_currentAssignedReplicaMap.containsKey(resourceName)) {
+      LOG.warn("Resource {} is not on node {}. Ignore the release call.", resourceName,
+          getInstanceName());
+      return;
+    }
+
+    Map<String, AssignableReplica> partitionMap = _currentAssignedReplicaMap.get(resourceName);
+    if (!partitionMap.containsKey(partitionName) || !partitionMap.get(partitionName)
+        .equals(replica)) {
+      LOG.warn("Replica {} is not assigned to node {}. Ignore the release call.",
+          replica.toString(), getInstanceName());
+      return;
+    }
+
+    AssignableReplica removedReplica = partitionMap.remove(partitionName);
+    removedReplica.getCapacity().entrySet().stream()
+        .forEach(entry -> updateRemainingCapacity(entry.getKey(), -1 * entry.getValue()));
+  }
+
+  /**
+   * @return A set of all assigned replicas on the node.
+   */
+  Set<AssignableReplica> getAssignedReplicas() {
+    return _currentAssignedReplicaMap.values().stream()
+        .flatMap(replicaMap -> replicaMap.values().stream()).collect(Collectors.toSet());
+  }
+
+  /**
+   * @return The current assignment in a map of <resource name, set of partition names>
+   */
+  Map<String, Set<String>> getAssignedPartitionsMap() {
+    Map<String, Set<String>> assignmentMap = new HashMap<>();
+    for (String resourceName : _currentAssignedReplicaMap.keySet()) {
+      assignmentMap.put(resourceName, _currentAssignedReplicaMap.get(resourceName).keySet());
+    }
+    return assignmentMap;
+  }
+
+  /**
+   * @param resource Resource name
+   * @return A set of the current assigned replicas' partition names in the specified resource.
+   */
+  public Set<String> getAssignedPartitionsByResource(String resource) {
+    return _currentAssignedReplicaMap.getOrDefault(resource, Collections.emptyMap()).keySet();
+  }
+
+  /**
+   * @param resource Resource name
+   * @return A set of the current assigned replicas' partition names with the top state in the
+   *         specified resource.
+   */
+  Set<String> getAssignedTopStatePartitionsByResource(String resource) {
+    return _currentAssignedReplicaMap.getOrDefault(resource, Collections.emptyMap()).entrySet()
+        .stream().filter(partitionEntry -> partitionEntry.getValue().isReplicaTopState())
+        .map(partitionEntry -> partitionEntry.getKey()).collect(Collectors.toSet());
+  }
+
+  /**
+   * @return The total count of assigned top state partitions.
+   */
+  public int getAssignedTopStatePartitionsCount() {
+    return (int) _currentAssignedReplicaMap.values().stream()
+        .flatMap(replicaMap -> replicaMap.values().stream())
+        .filter(AssignableReplica::isReplicaTopState).count();
+  }
+
+  /**
+   * @return The total count of assigned replicas.
+   */
+  public int getAssignedReplicaCount() {
+    return _currentAssignedReplicaMap.values().stream().mapToInt(Map::size).sum();
+  }
+
+  /**
+   * @return The current available capacity.
+   */
+  public Map<String, Integer> getRemainingCapacity() {
+    return _remainingCapacity;
+  }
+
+  /**
+   * @return A map of <capacity category, capacity number> that describes the max capacity of the
+   *         node.
+   */
+  public Map<String, Integer> getMaxCapacity() {
+    return _maxAllowedCapacity;
+  }
+
+  /**
+   * Return the most concerning capacity utilization number for evenly partition assignment.
+   * The method dynamically calculates the projected highest utilization number among all the
+   * capacity categories assuming the new capacity usage is added to the node.
+   * For example, if the current node usage is {CPU: 0.9, MEM: 0.4, DISK: 0.6}. Then this call shall
+   * return 0.9.
+   * @param newUsage the proposed new additional capacity usage.
+   * @return The highest utilization number of the node among all the capacity category.
+   */
+  public float getProjectedHighestUtilization(Map<String, Integer> newUsage) {
+    float highestCapacityUtilization = 0;
+    for (String capacityKey : _maxAllowedCapacity.keySet()) {
+      float capacityValue = _maxAllowedCapacity.get(capacityKey);
+      float utilization = (capacityValue - _remainingCapacity.get(capacityKey) + newUsage
+          .getOrDefault(capacityKey, 0)) / capacityValue;
+      highestCapacityUtilization = Math.max(highestCapacityUtilization, utilization);
+    }
+    return highestCapacityUtilization;
+  }
+
+  public String getInstanceName() {
+    return _instanceName;
+  }
+
+  public Set<String> getInstanceTags() {
+    return _instanceTags;
+  }
+
+  public String getFaultZone() {
+    return _faultZone;
+  }
+
+  public boolean hasFaultZone() {
+    return _faultZone != null;
+  }
+
+  /**
+   * @return A map of <resource name, set of partition names> contains all the partitions that are
+   *         disabled on the node.
+   */
+  public Map<String, List<String>> getDisabledPartitionsMap() {
+    return _disabledPartitionsMap;
+  }
+
+  /**
+   * @return The max partition count that are allowed to be allocated on the node.
+   */
+  public int getMaxPartition() {
+    return _maxPartition;
+  }
+
+  /**
+   * Computes the fault zone id based on the domain and fault zone type when topology is enabled.
+   * For example, when
+   * the domain is "zone=2, instance=testInstance" and the fault zone type is "zone", this function
+   * returns "2".
+   * If cannot find the fault zone type, this function leaves the fault zone id as the instance name.
+   * Note the WAGED rebalancer does not require full topology tree to be created. So this logic is
+   * simpler than the CRUSH based rebalancer.
+   */
+  private String computeFaultZone(ClusterConfig clusterConfig, InstanceConfig instanceConfig) {
+    if (!clusterConfig.isTopologyAwareEnabled()) {
+      // Instance name is the default fault zone if topology awareness is false.
+      return instanceConfig.getInstanceName();
+    }
+    String topologyStr = clusterConfig.getTopology();
+    String faultZoneType = clusterConfig.getFaultZoneType();
+    if (topologyStr == null || faultZoneType == null) {
+      LOG.debug("Topology configuration is not complete. Topology define: {}, Fault Zone Type: {}",
+          topologyStr, faultZoneType);
+      // Use the instance name, or the deprecated ZoneId field (if exists) as the default fault
+      // zone.
+      String zoneId = instanceConfig.getZoneId();
+      return zoneId == null ? instanceConfig.getInstanceName() : zoneId;
+    } else {
+      // Get the fault zone information from the complete topology definition.
+      String[] topologyKeys = topologyStr.trim().split("/");
+      if (topologyKeys.length == 0 || Arrays.stream(topologyKeys)
+          .noneMatch(type -> type.equals(faultZoneType))) {
+        throw new HelixException(
+            "The configured topology definition is empty or does not contain the fault zone type.");
+      }
+
+      Map<String, String> domainAsMap = instanceConfig.getDomainAsMap();
+      StringBuilder faultZoneStringBuilder = new StringBuilder();
+      for (String key : topologyKeys) {
+        if (!key.isEmpty()) {
+          // if a key does not exist in the instance domain config, apply the default domain value.
+          faultZoneStringBuilder.append(domainAsMap.getOrDefault(key, "Default_" + key));
+          if (key.equals(faultZoneType)) {
+            break;
+          } else {
+            faultZoneStringBuilder.append('/');
+          }
+        }
+      }
+      return faultZoneStringBuilder.toString();
+    }
+  }
+
+  /**
+   * @throws HelixException if the replica has already been assigned to the node.
+   */
+  private void addToAssignmentRecord(AssignableReplica replica) {
+    String resourceName = replica.getResourceName();
+    String partitionName = replica.getPartitionName();
+    if (_currentAssignedReplicaMap.containsKey(resourceName) && _currentAssignedReplicaMap
+        .get(resourceName).containsKey(partitionName)) {
+      throw new HelixException(String
+          .format("Resource %s already has a replica with state %s from partition %s on node %s",
+              replica.getResourceName(), replica.getReplicaState(), replica.getPartitionName(),
+              getInstanceName()));
+    } else {
+      _currentAssignedReplicaMap.computeIfAbsent(resourceName, key -> new HashMap<>())
+          .put(partitionName, replica);
+    }
+  }
+
+  private void updateRemainingCapacity(String capacityKey, int usage) {
+    if (!_remainingCapacity.containsKey(capacityKey)) {
+      //if the capacityKey belongs to replicas does not exist in the instance's capacity,
+      // it will be treated as if it has unlimited capacity of that capacityKey
+      return;
+    }
+    _remainingCapacity.put(capacityKey, _remainingCapacity.get(capacityKey) - usage);
+  }
+
+  /**
+   * Get and validate the instance capacity from instance config.
+   * @throws HelixException if any required capacity key is not configured in the instance config.
+   */
+  private Map<String, Integer> fetchInstanceCapacity(ClusterConfig clusterConfig,
+      InstanceConfig instanceConfig) {
+    Map<String, Integer> instanceCapacity =
+        WagedValidationUtil.validateAndGetInstanceCapacity(clusterConfig, instanceConfig);
+    // Remove all the non-required capacity items from the map.
+    instanceCapacity.keySet().retainAll(clusterConfig.getInstanceCapacityKeys());
+    return instanceCapacity;
+  }
+
+  @Override
+  public int hashCode() {
+    return _instanceName.hashCode();
+  }
+
+  @Override
+  public int compareTo(AssignableNode o) {
+    return _instanceName.compareTo(o.getInstanceName());
+  }
+
+  @Override
+  public String toString() {
+    return _instanceName;
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/AssignableReplica.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/AssignableReplica.java
new file mode 100644
index 0000000..fdcc03a
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/AssignableReplica.java
@@ -0,0 +1,161 @@
+package org.apache.helix.controller.rebalancer.waged.model;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.helix.HelixException;
+import org.apache.helix.controller.rebalancer.util.WagedValidationUtil;
+import org.apache.helix.model.ClusterConfig;
+import org.apache.helix.model.ResourceConfig;
+import org.apache.helix.model.StateModelDefinition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * This class represents a partition replication that needs to be allocated.
+ */
+public class AssignableReplica implements Comparable<AssignableReplica> {
+  private static final Logger LOG = LoggerFactory.getLogger(AssignableReplica.class);
+
+  private final String _replicaKey;
+  private final String _partitionName;
+  private final String _resourceName;
+  private final String _resourceInstanceGroupTag;
+  private final int _resourceMaxPartitionsPerInstance;
+  private final Map<String, Integer> _capacityUsage;
+  // The priority of the replica's state
+  private final int _statePriority;
+  // The state of the replica
+  private final String _replicaState;
+
+  /**
+   * @param clusterConfig  The cluster config.
+   * @param resourceConfig The resource config for the resource which contains the replication.
+   * @param partitionName  The replication's partition name.
+   * @param replicaState   The state of the replication.
+   * @param statePriority  The priority of the replication's state.
+   */
+  AssignableReplica(ClusterConfig clusterConfig, ResourceConfig resourceConfig,
+      String partitionName, String replicaState, int statePriority) {
+    _partitionName = partitionName;
+    _replicaState = replicaState;
+    _statePriority = statePriority;
+    _resourceName = resourceConfig.getResourceName();
+    _capacityUsage = fetchCapacityUsage(partitionName, resourceConfig, clusterConfig);
+    _resourceInstanceGroupTag = resourceConfig.getInstanceGroupTag();
+    _resourceMaxPartitionsPerInstance = resourceConfig.getMaxPartitionsPerInstance();
+    _replicaKey = generateReplicaKey(_resourceName, _partitionName,_replicaState);
+  }
+
+  public Map<String, Integer> getCapacity() {
+    return _capacityUsage;
+  }
+
+  public String getPartitionName() {
+    return _partitionName;
+  }
+
+  public String getReplicaState() {
+    return _replicaState;
+  }
+
+  public boolean isReplicaTopState() {
+    return _statePriority == StateModelDefinition.TOP_STATE_PRIORITY;
+  }
+
+  public int getStatePriority() {
+    return _statePriority;
+  }
+
+  public String getResourceName() {
+    return _resourceName;
+  }
+
+  public String getResourceInstanceGroupTag() {
+    return _resourceInstanceGroupTag;
+  }
+
+  public boolean hasResourceInstanceGroupTag() {
+    return _resourceInstanceGroupTag != null && !_resourceInstanceGroupTag.isEmpty();
+  }
+
+  public int getResourceMaxPartitionsPerInstance() {
+    return _resourceMaxPartitionsPerInstance;
+  }
+
+  @Override
+  public String toString() {
+    return _replicaKey;
+  }
+
+  @Override
+  public int compareTo(AssignableReplica replica) {
+    if (!_resourceName.equals(replica._resourceName)) {
+      return _resourceName.compareTo(replica._resourceName);
+    }
+    if (!_partitionName.equals(replica._partitionName)) {
+      return _partitionName.compareTo(replica._partitionName);
+    }
+    if (!_replicaState.equals(replica._replicaState)) {
+      return _replicaState.compareTo(replica._replicaState);
+    }
+    return 0;
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+    if (obj == null) {
+      return false;
+    }
+    if (obj instanceof AssignableReplica) {
+      return compareTo((AssignableReplica) obj) == 0;
+    } else {
+      return false;
+    }
+  }
+
+  public static String generateReplicaKey(String resourceName, String partitionName, String state) {
+    return String.format("%s-%s-%s", resourceName, partitionName, state);
+  }
+
+  /**
+   * Parse the resource config for the partition weight.
+   */
+  private Map<String, Integer> fetchCapacityUsage(String partitionName,
+      ResourceConfig resourceConfig, ClusterConfig clusterConfig) {
+    Map<String, Map<String, Integer>> capacityMap;
+    try {
+      capacityMap = resourceConfig.getPartitionCapacityMap();
+    } catch (IOException ex) {
+      throw new IllegalArgumentException(
+          "Invalid partition capacity configuration of resource: " + resourceConfig
+              .getResourceName(), ex);
+    }
+    Map<String, Integer> partitionCapacity = WagedValidationUtil
+        .validateAndGetPartitionCapacity(partitionName, resourceConfig, capacityMap, clusterConfig);
+    // Remove the non-required capacity items.
+    partitionCapacity.keySet().retainAll(clusterConfig.getInstanceCapacityKeys());
+    return partitionCapacity;
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/ClusterContext.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/ClusterContext.java
new file mode 100644
index 0000000..4705be5
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/ClusterContext.java
@@ -0,0 +1,172 @@
+package org.apache.helix.controller.rebalancer.waged.model;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import org.apache.helix.HelixException;
+import org.apache.helix.model.ResourceAssignment;
+
+
+/**
+ * This class tracks the rebalance-related global cluster status.
+ */
+public class ClusterContext {
+  // This estimation helps to ensure global partition count evenness
+  private final int _estimatedMaxPartitionCount;
+  // This estimation helps to ensure global top state replica count evenness
+  private final int _estimatedMaxTopStateCount;
+  // This estimation helps to ensure per-resource partition count evenness
+  private final Map<String, Integer> _estimatedMaxPartitionByResource = new HashMap<>();
+  // This estimation helps to ensure global resource usage evenness.
+  private final float _estimatedMaxUtilization;
+
+  // map{zoneName : map{resourceName : set(partitionNames)}}
+  private Map<String, Map<String, Set<String>>> _assignmentForFaultZoneMap = new HashMap<>();
+  // Records about the previous assignment
+  // <ResourceName, ResourceAssignment contains the baseline assignment>
+  private final Map<String, ResourceAssignment> _baselineAssignment;
+  // <ResourceName, ResourceAssignment contains the best possible assignment>
+  private final Map<String, ResourceAssignment> _bestPossibleAssignment;
+
+  /**
+   * Construct the cluster context based on the current instance status.
+   * @param replicaSet All the partition replicas that are managed by the rebalancer
+   * @param nodeSet All the active nodes that are managed by the rebalancer
+   */
+  ClusterContext(Set<AssignableReplica> replicaSet, Set<AssignableNode> nodeSet,
+      Map<String, ResourceAssignment> baselineAssignment, Map<String, ResourceAssignment> bestPossibleAssignment) {
+    int instanceCount = nodeSet.size();
+    int totalReplicas = 0;
+    int totalTopStateReplicas = 0;
+    Map<String, Integer> totalUsage = new HashMap<>();
+    Map<String, Integer> totalCapacity = new HashMap<>();
+
+    for (Map.Entry<String, List<AssignableReplica>> entry : replicaSet.stream()
+        .collect(Collectors.groupingBy(AssignableReplica::getResourceName))
+        .entrySet()) {
+      int replicas = entry.getValue().size();
+      totalReplicas += replicas;
+
+      int replicaCnt = Math.max(1, estimateAvgReplicaCount(replicas, instanceCount));
+      _estimatedMaxPartitionByResource.put(entry.getKey(), replicaCnt);
+
+      for (AssignableReplica replica : entry.getValue()) {
+        if (replica.isReplicaTopState()) {
+          totalTopStateReplicas += 1;
+        }
+        replica.getCapacity().entrySet().stream().forEach(capacityEntry -> totalUsage
+            .compute(capacityEntry.getKey(),
+                (k, v) -> (v == null) ? capacityEntry.getValue() : (v + capacityEntry.getValue())));
+      }
+    }
+    nodeSet.stream().forEach(node -> node.getMaxCapacity().entrySet().stream().forEach(
+        capacityEntry -> totalCapacity.compute(capacityEntry.getKey(),
+            (k, v) -> (v == null) ? capacityEntry.getValue() : (v + capacityEntry.getValue()))));
+
+    if (totalCapacity.isEmpty()) {
+      // If no capacity is configured, we treat the cluster as fully utilized.
+      _estimatedMaxUtilization = 1f;
+    } else {
+      float estimatedMaxUsage = 0;
+      for (String capacityKey : totalCapacity.keySet()) {
+        int maxCapacity = totalCapacity.get(capacityKey);
+        int usage = totalUsage.getOrDefault(capacityKey, 0);
+        float utilization = (maxCapacity == 0) ? 1 : (float) usage / maxCapacity;
+        estimatedMaxUsage = Math.max(estimatedMaxUsage, utilization);
+      }
+      _estimatedMaxUtilization = estimatedMaxUsage;
+    }
+    _estimatedMaxPartitionCount = estimateAvgReplicaCount(totalReplicas, instanceCount);
+    _estimatedMaxTopStateCount = estimateAvgReplicaCount(totalTopStateReplicas, instanceCount);
+    _baselineAssignment = baselineAssignment;
+    _bestPossibleAssignment = bestPossibleAssignment;
+  }
+
+  public Map<String, ResourceAssignment> getBaselineAssignment() {
+    return _baselineAssignment == null || _baselineAssignment.isEmpty() ? Collections.emptyMap() : _baselineAssignment;
+  }
+
+  public Map<String, ResourceAssignment> getBestPossibleAssignment() {
+    return _bestPossibleAssignment == null || _bestPossibleAssignment.isEmpty() ? Collections.emptyMap()
+        : _bestPossibleAssignment;
+  }
+
+  public Map<String, Map<String, Set<String>>> getAssignmentForFaultZoneMap() {
+    return _assignmentForFaultZoneMap;
+  }
+
+  public int getEstimatedMaxPartitionCount() {
+    return _estimatedMaxPartitionCount;
+  }
+
+  public int getEstimatedMaxPartitionByResource(String resourceName) {
+    return _estimatedMaxPartitionByResource.get(resourceName);
+  }
+
+  public int getEstimatedMaxTopStateCount() {
+    return _estimatedMaxTopStateCount;
+  }
+
+  public float getEstimatedMaxUtilization() {
+    return _estimatedMaxUtilization;
+  }
+
+  public Set<String> getPartitionsForResourceAndFaultZone(String resourceName, String faultZoneId) {
+    return _assignmentForFaultZoneMap.getOrDefault(faultZoneId, Collections.emptyMap())
+        .getOrDefault(resourceName, Collections.emptySet());
+  }
+
+  void addPartitionToFaultZone(String faultZoneId, String resourceName, String partition) {
+    if (!_assignmentForFaultZoneMap.computeIfAbsent(faultZoneId, k -> new HashMap<>())
+        .computeIfAbsent(resourceName, k -> new HashSet<>())
+        .add(partition)) {
+      throw new HelixException(
+          String.format("Resource %s already has a replica from partition %s in fault zone %s", resourceName, partition,
+              faultZoneId));
+    }
+  }
+
+  boolean removePartitionFromFaultZone(String faultZoneId, String resourceName, String partition) {
+    return _assignmentForFaultZoneMap.getOrDefault(faultZoneId, Collections.emptyMap())
+        .getOrDefault(resourceName, Collections.emptySet())
+        .remove(partition);
+  }
+
+  void setAssignmentForFaultZoneMap(Map<String, Map<String, Set<String>>> assignmentForFaultZoneMap) {
+    _assignmentForFaultZoneMap = assignmentForFaultZoneMap;
+  }
+
+  private int estimateAvgReplicaCount(int replicaCount, int instanceCount) {
+    // Use the floor to ensure evenness.
+    // Note if we calculate estimation based on ceil, we might have some low usage participants.
+    // For example, if the evaluation is between 1 and 2. While we use 2, many participants will be
+    // allocated with 2 partitions. And the other participants only has 0 partitions. Otherwise,
+    // if we use 1, most participant will have 1 partition assigned and several participant has 2
+    // partitions. The later scenario is what we want to achieve.
+    return (int) Math.floor((float) replicaCount / instanceCount);
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/ClusterModel.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/ClusterModel.java
new file mode 100644
index 0000000..57ffa42
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/ClusterModel.java
@@ -0,0 +1,132 @@
+package org.apache.helix.controller.rebalancer.waged.model;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.Collections;
+import java.util.Map;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import org.apache.helix.HelixException;
+
+/**
+ * This class wraps the required input for the rebalance algorithm.
+ */
+public class ClusterModel {
+  private final ClusterContext _clusterContext;
+  // Map to track all the assignable replications. <Resource Name, Set<Replicas>>
+  private final Map<String, Set<AssignableReplica>> _assignableReplicaMap;
+  // The index to find the replication information with a certain state. <Resource, <Key(resource_partition_state), Replica>>
+  // Note that the identical replicas are deduped in the index.
+  private final Map<String, Map<String, AssignableReplica>> _assignableReplicaIndex;
+  private final Map<String, AssignableNode> _assignableNodeMap;
+
+  /**
+   * @param clusterContext         The initialized cluster context.
+   * @param assignableReplicas     The replicas to be assigned.
+   *                               Note that the replicas in this list shall not be included while initializing the context and assignable nodes.
+   * @param assignableNodes        The active instances.
+   */
+  ClusterModel(ClusterContext clusterContext, Set<AssignableReplica> assignableReplicas,
+      Set<AssignableNode> assignableNodes) {
+    _clusterContext = clusterContext;
+
+    // Save all the to be assigned replication
+    _assignableReplicaMap = assignableReplicas.stream()
+        .collect(Collectors.groupingBy(AssignableReplica::getResourceName, Collectors.toSet()));
+
+    // Index all the replicas to be assigned. Dedup the replica if two instances have the same resource/partition/state
+    _assignableReplicaIndex = assignableReplicas.stream().collect(Collectors
+        .groupingBy(AssignableReplica::getResourceName, Collectors
+            .toMap(AssignableReplica::toString, replica -> replica,
+                (oldValue, newValue) -> oldValue)));
+
+    _assignableNodeMap = assignableNodes.parallelStream()
+        .collect(Collectors.toMap(AssignableNode::getInstanceName, node -> node));
+  }
+
+  public ClusterContext getContext() {
+    return _clusterContext;
+  }
+
+  public Map<String, AssignableNode> getAssignableNodes() {
+    return _assignableNodeMap;
+  }
+
+  public Map<String, Set<AssignableReplica>> getAssignableReplicaMap() {
+    return _assignableReplicaMap;
+  }
+
+  /**
+   * Assign the given replica to the specified instance and record the assignment in the cluster model.
+   * The cluster usage information will be updated accordingly.
+   *
+   * @param resourceName
+   * @param partitionName
+   * @param state
+   * @param instanceName
+   */
+  public void assign(String resourceName, String partitionName, String state, String instanceName) {
+    AssignableNode node = locateAssignableNode(instanceName);
+    AssignableReplica replica = locateAssignableReplica(resourceName, partitionName, state);
+
+    node.assign(replica);
+    _clusterContext.addPartitionToFaultZone(node.getFaultZone(), resourceName, partitionName);
+  }
+
+  /**
+   * Revert the proposed assignment from the cluster model.
+   * The cluster usage information will be updated accordingly.
+   *
+   * @param resourceName
+   * @param partitionName
+   * @param state
+   * @param instanceName
+   */
+  public void release(String resourceName, String partitionName, String state,
+      String instanceName) {
+    AssignableNode node = locateAssignableNode(instanceName);
+    AssignableReplica replica = locateAssignableReplica(resourceName, partitionName, state);
+
+    node.release(replica);
+    _clusterContext.removePartitionFromFaultZone(node.getFaultZone(), resourceName, partitionName);
+  }
+
+  private AssignableNode locateAssignableNode(String instanceName) {
+    AssignableNode node = _assignableNodeMap.get(instanceName);
+    if (node == null) {
+      throw new HelixException("Cannot find the instance: " + instanceName);
+    }
+    return node;
+  }
+
+  private AssignableReplica locateAssignableReplica(String resourceName, String partitionName,
+      String state) {
+    AssignableReplica sampleReplica =
+        _assignableReplicaIndex.getOrDefault(resourceName, Collections.emptyMap())
+            .get(AssignableReplica.generateReplicaKey(resourceName, partitionName, state));
+    if (sampleReplica == null) {
+      throw new HelixException(String
+          .format("Cannot find the replication with resource name %s, partition name %s, state %s.",
+              resourceName, partitionName, state));
+    }
+    return sampleReplica;
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/ClusterModelProvider.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/ClusterModelProvider.java
new file mode 100644
index 0000000..41c43d6
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/ClusterModelProvider.java
@@ -0,0 +1,532 @@
+package org.apache.helix.controller.rebalancer.waged.model;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import org.apache.helix.HelixConstants;
+import org.apache.helix.HelixException;
+import org.apache.helix.controller.dataproviders.ResourceControllerDataProvider;
+import org.apache.helix.model.ClusterConfig;
+import org.apache.helix.model.IdealState;
+import org.apache.helix.model.InstanceConfig;
+import org.apache.helix.model.Resource;
+import org.apache.helix.model.ResourceAssignment;
+import org.apache.helix.model.ResourceConfig;
+import org.apache.helix.model.StateModelDefinition;
+
+/**
+ * This util class generates Cluster Model object based on the controller's data cache.
+ */
+public class ClusterModelProvider {
+
+  private enum RebalanceScopeType {
+    // Set the rebalance scope to cover the difference between the current assignment and the
+    // Baseline assignment only.
+    PARTIAL,
+    // Set the rebalance scope to cover all replicas that need relocation based on the cluster
+    // changes.
+    GLOBAL_BASELINE
+  }
+
+  /**
+   * Generate a new Cluster Model object according to the current cluster status for partial
+   * rebalance. The rebalance scope is configured for recovering the missing replicas that are in
+   * the Baseline assignment but not in the current Best possible assignment only.
+   * @param dataProvider           The controller's data cache.
+   * @param resourceMap            The full list of the resources to be rebalanced. Note that any
+   *                               resources that are not in this list will be removed from the
+   *                               final assignment.
+   * @param activeInstances        The active instances that will be used in the calculation.
+   *                               Note this list can be different from the real active node list
+   *                               according to the rebalancer logic.
+   * @param baselineAssignment     The persisted Baseline assignment.
+   * @param bestPossibleAssignment The persisted Best Possible assignment that was generated in the
+   *                               previous rebalance.
+   * @return
+   */
+  public static ClusterModel generateClusterModelForPartialRebalance(
+      ResourceControllerDataProvider dataProvider, Map<String, Resource> resourceMap,
+      Set<String> activeInstances, Map<String, ResourceAssignment> baselineAssignment,
+      Map<String, ResourceAssignment> bestPossibleAssignment) {
+    return generateClusterModel(dataProvider, resourceMap, activeInstances, Collections.emptyMap(),
+        baselineAssignment, bestPossibleAssignment, RebalanceScopeType.PARTIAL);
+  }
+
+  /**
+   * Generate a new Cluster Model object according to the current cluster status for the Baseline
+   * calculation. The rebalance scope is determined according to the cluster changes.
+   * @param dataProvider           The controller's data cache.
+   * @param resourceMap            The full list of the resources to be rebalanced. Note that any
+   *                               resources that are not in this list will be removed from the
+   *                               final assignment.
+   * @param allInstances           All the instances that will be used in the calculation.
+   * @param clusterChanges         All the cluster changes that happened after the previous rebalance.
+   * @param baselineAssignment     The previous Baseline assignment.
+   * @return the new cluster model
+   */
+  public static ClusterModel generateClusterModelForBaseline(
+      ResourceControllerDataProvider dataProvider, Map<String, Resource> resourceMap,
+      Set<String> allInstances, Map<HelixConstants.ChangeType, Set<String>> clusterChanges,
+      Map<String, ResourceAssignment> baselineAssignment) {
+    return generateClusterModel(dataProvider, resourceMap, allInstances, clusterChanges,
+        Collections.emptyMap(), baselineAssignment, RebalanceScopeType.GLOBAL_BASELINE);
+  }
+
+  /**
+   * Generate a cluster model based on the current state output and data cache. The rebalance scope
+   * is configured for recovering the missing replicas only.
+   * @param dataProvider           The controller's data cache.
+   * @param resourceMap            The full list of the resources to be rebalanced. Note that any
+   *                               resources that are not in this list will be removed from the
+   *                               final assignment.
+   * @param currentStateAssignment The resource assignment built from current state output.
+   * @return the new cluster model
+   */
+  public static ClusterModel generateClusterModelFromExistingAssignment(
+      ResourceControllerDataProvider dataProvider, Map<String, Resource> resourceMap,
+      Map<String, ResourceAssignment> currentStateAssignment) {
+    return generateClusterModel(dataProvider, resourceMap, dataProvider.getEnabledLiveInstances(),
+        Collections.emptyMap(), Collections.emptyMap(), currentStateAssignment,
+        RebalanceScopeType.GLOBAL_BASELINE);
+  }
+
+  /**
+   * Generate a new Cluster Model object according to the current cluster status.
+   * @param dataProvider           The controller's data cache.
+   * @param resourceMap            The full list of the resources to be rebalanced. Note that any
+   *                               resources that are not in this list will be removed from the
+   *                               final assignment.
+   * @param activeInstances        The active instances that will be used in the calculation.
+   *                               Note this list can be different from the real active node list
+   *                               according to the rebalancer logic.
+   * @param clusterChanges         All the cluster changes that happened after the previous rebalance.
+   * @param idealAssignment        The ideal assignment.
+   * @param currentAssignment      The current assignment that was generated in the previous rebalance.
+   * @param scopeType              Specify how to determine the rebalance scope.
+   * @return the new cluster model
+   */
+  private static ClusterModel generateClusterModel(ResourceControllerDataProvider dataProvider,
+      Map<String, Resource> resourceMap, Set<String> activeInstances,
+      Map<HelixConstants.ChangeType, Set<String>> clusterChanges,
+      Map<String, ResourceAssignment> idealAssignment,
+      Map<String, ResourceAssignment> currentAssignment, RebalanceScopeType scopeType) {
+    // Construct all the assignable nodes and initialize with the allocated replicas.
+    Set<AssignableNode> assignableNodes =
+        getAllAssignableNodes(dataProvider.getClusterConfig(), dataProvider.getInstanceConfigMap(),
+            activeInstances);
+
+    // Generate replica objects for all the resource partitions.
+    // <resource, replica set>
+    Map<String, Set<AssignableReplica>> replicaMap =
+        getAllAssignableReplicas(dataProvider, resourceMap, assignableNodes);
+
+    // Check if the replicas need to be reassigned.
+    Map<String, Set<AssignableReplica>> allocatedReplicas =
+        new HashMap<>(); // <instanceName, replica set>
+    Set<AssignableReplica> toBeAssignedReplicas;
+    switch (scopeType) {
+      case GLOBAL_BASELINE:
+        toBeAssignedReplicas = findToBeAssignedReplicasByClusterChanges(replicaMap, activeInstances,
+            dataProvider.getLiveInstances().keySet(), clusterChanges, currentAssignment,
+            allocatedReplicas);
+        break;
+      case PARTIAL:
+        // Filter to remove the replicas that do not exist in the ideal assignment given but exist
+        // in the replicaMap. This is because such replicas are new additions that do not need to be
+        // rebalanced right away.
+        retainExistingReplicas(replicaMap, idealAssignment);
+        toBeAssignedReplicas =
+            findToBeAssignedReplicasByComparingWithIdealAssignment(replicaMap, activeInstances,
+                idealAssignment, currentAssignment, allocatedReplicas);
+        break;
+      default:
+        throw new HelixException("Unknown rebalance scope type: " + scopeType);
+    }
+
+    // Update the allocated replicas to the assignable nodes.
+    assignableNodes.parallelStream().forEach(node -> node.assignInitBatch(
+        allocatedReplicas.getOrDefault(node.getInstanceName(), Collections.emptySet())));
+
+    // Construct and initialize cluster context.
+    ClusterContext context = new ClusterContext(
+        replicaMap.values().stream().flatMap(Set::stream).collect(Collectors.toSet()),
+        assignableNodes, idealAssignment, currentAssignment);
+    // Initial the cluster context with the allocated assignments.
+    context.setAssignmentForFaultZoneMap(mapAssignmentToFaultZone(assignableNodes));
+
+    return new ClusterModel(context, toBeAssignedReplicas, assignableNodes);
+  }
+
+  // Filter the replicas map so only the replicas that have been allocated in the existing
+  // assignmentMap remain in the map.
+  private static void retainExistingReplicas(Map<String, Set<AssignableReplica>> replicaMap,
+      Map<String, ResourceAssignment> assignmentMap) {
+    replicaMap.entrySet().parallelStream().forEach(replicaSetEntry -> {
+      // <partition, <state, instances set>>
+      Map<String, Map<String, Set<String>>> stateInstanceMap =
+          getStateInstanceMap(assignmentMap.get(replicaSetEntry.getKey()));
+      // Iterate the replicas of the resource to find the ones that require reallocating.
+      Iterator<AssignableReplica> replicaIter = replicaSetEntry.getValue().iterator();
+      while (replicaIter.hasNext()) {
+        AssignableReplica replica = replicaIter.next();
+        Set<String> validInstances =
+            stateInstanceMap.getOrDefault(replica.getPartitionName(), Collections.emptyMap())
+                .getOrDefault(replica.getReplicaState(), Collections.emptySet());
+        if (validInstances.isEmpty()) {
+          // Removing the replica if it is not known in the assignment map.
+          replicaIter.remove();
+        } else {
+          // Remove the instance from the state map record after processing so it won't be
+          // double-processed as we loop through all replica
+          validInstances.remove(validInstances.iterator().next());
+        }
+      }
+    });
+  }
+
+  /**
+   * Find the minimum set of replicas that need to be reassigned by comparing the current assignment
+   * with the ideal assignment.
+   * A replica needs to be reassigned or newly assigned if either of the following conditions is true:
+   * 1. The partition allocation (the instance the replica is placed on) in the ideal assignment and
+   * the current assignment are different. And the allocation in the ideal assignment is valid.
+   * So it is worthwhile to move it.
+   * 2. The partition allocation is in neither the ideal assignment nor the current assignment. Or
+   * those allocations are not valid due to offline or disabled instances.
+   * Otherwise, the rebalancer just keeps the current assignment allocation.
+   *
+   * @param replicaMap             A map contains all the replicas grouped by resource name.
+   * @param activeInstances        All the instances that are live and enabled according to the delay rebalance configuration.
+   * @param idealAssignment        The ideal assignment.
+   * @param currentAssignment      The current assignment that was generated in the previous rebalance.
+   * @param allocatedReplicas      A map of <Instance -> replicas> to return the allocated replicas grouped by the target instance name.
+   * @return The replicas that need to be reassigned.
+   */
+  private static Set<AssignableReplica> findToBeAssignedReplicasByComparingWithIdealAssignment(
+      Map<String, Set<AssignableReplica>> replicaMap, Set<String> activeInstances,
+      Map<String, ResourceAssignment> idealAssignment,
+      Map<String, ResourceAssignment> currentAssignment,
+      Map<String, Set<AssignableReplica>> allocatedReplicas) {
+    Set<AssignableReplica> toBeAssignedReplicas = new HashSet<>();
+    // check each resource to identify the allocated replicas and to-be-assigned replicas.
+    for (String resourceName : replicaMap.keySet()) {
+      // <partition, <state, instances set>>
+      Map<String, Map<String, Set<String>>> idealPartitionStateMap =
+          getValidStateInstanceMap(idealAssignment.get(resourceName), activeInstances);
+      Map<String, Map<String, Set<String>>> currentPartitionStateMap =
+          getValidStateInstanceMap(currentAssignment.get(resourceName), activeInstances);
+      // Iterate the replicas of the resource to find the ones that require reallocating.
+      for (AssignableReplica replica : replicaMap.get(resourceName)) {
+        String partitionName = replica.getPartitionName();
+        String replicaState = replica.getReplicaState();
+        Set<String> idealAllocations =
+            idealPartitionStateMap.getOrDefault(partitionName, Collections.emptyMap())
+                .getOrDefault(replicaState, Collections.emptySet());
+        Set<String> currentAllocations =
+            currentPartitionStateMap.getOrDefault(partitionName, Collections.emptyMap())
+                .getOrDefault(replicaState, Collections.emptySet());
+
+        // Compare the current assignments with the ideal assignment for the common part.
+        List<String> commonAllocations = new ArrayList<>(currentAllocations);
+        commonAllocations.retainAll(idealAllocations);
+        if (!commonAllocations.isEmpty()) {
+          // 1. If the partition is allocated at the same location in both ideal and current
+          // assignments, there is no need to reassign it.
+          String allocatedInstance = commonAllocations.get(0);
+          allocatedReplicas.computeIfAbsent(allocatedInstance, key -> new HashSet<>()).add(replica);
+          // Remove the instance from the record to prevent this instance from being processed twice.
+          idealAllocations.remove(allocatedInstance);
+          currentAllocations.remove(allocatedInstance);
+        } else if (!idealAllocations.isEmpty()) {
+          // 2. If the partition is allocated at an active instance in the ideal assignment but the
+          // same allocation does not exist in the current assignment, try to rebalance the replica
+          // or assign it if the replica has not been assigned.
+          // There are two possible conditions,
+          // * This replica has been newly added and has not been assigned yet, so it appears in
+          // the ideal assignment and does not appear in the current assignment.
+          // * The allocation of this replica in the ideal assignment has been updated due to a
+          // cluster change. For example, new instance is added. So the old allocation in the
+          // current assignment might be sub-optimal.
+          // In either condition, we add it to toBeAssignedReplicas so that it will get assigned.
+          toBeAssignedReplicas.add(replica);
+          // Remove the pending allocation from the idealAllocations after processing so that the
+          // instance won't be double-processed as we loop through all replicas
+          String pendingAllocation = idealAllocations.iterator().next();
+          idealAllocations.remove(pendingAllocation);
+        } else if (!currentAllocations.isEmpty()) {
+          // 3. This replica exists in the current assignment but does not appear or has a valid
+          // allocation in the ideal assignment.
+          // This means either 1) that the ideal assignment actually has this replica allocated on
+          // this instance, but it does not show up because the instance is temporarily offline or
+          // disabled (note that all such instances have been filtered out in earlier part of the
+          // logic) or that the most recent version of the ideal assignment was not fetched
+          // correctly from the assignment metadata store.
+          // In either case, the solution is to keep the current assignment. So put this replica
+          // with the allocated instance into the allocatedReplicas map.
+          String allocatedInstance = currentAllocations.iterator().next();
+          allocatedReplicas.computeIfAbsent(allocatedInstance, key -> new HashSet<>()).add(replica);
+          // Remove the instance from the record to prevent the same location being processed again.
+          currentAllocations.remove(allocatedInstance);
+        } else {
+          // 4. This replica is not found in either the ideal assignment or the current assignment
+          // with a valid allocation. This implies that the replica was newly added but was never
+          // assigned in reality or was added so recently that it hasn't shown up in the ideal
+          // assignment (because it's calculation takes longer and is asynchronously calculated).
+          // In that case, we add it to toBeAssignedReplicas so that it will get assigned as a
+          // result of partialRebalance.
+          toBeAssignedReplicas.add(replica);
+        }
+      }
+    }
+    return toBeAssignedReplicas;
+  }
+
+  /**
+   * Find the minimum set of replicas that need to be reassigned according to the cluster change.
+   * A replica needs to be reassigned if one of the following condition is true:
+   * 1. Cluster topology (the cluster config / any instance config) has been updated.
+   * 2. The resource config has been updated.
+   * 3. If the current assignment does not contain the partition's valid assignment.
+   *
+   * @param replicaMap             A map contains all the replicas grouped by resource name.
+   * @param activeInstances        All the instances that are live and enabled according to the delay rebalance configuration.
+   * @param liveInstances          All the instances that are live.
+   * @param clusterChanges         A map that contains all the important metadata updates that happened after the previous rebalance.
+   * @param currentAssignment      The current replica assignment.
+   * @param allocatedReplicas      Return the allocated replicas grouped by the target instance name.
+   * @return The replicas that need to be reassigned.
+   */
+  private static Set<AssignableReplica> findToBeAssignedReplicasByClusterChanges(
+      Map<String, Set<AssignableReplica>> replicaMap, Set<String> activeInstances,
+      Set<String> liveInstances, Map<HelixConstants.ChangeType, Set<String>> clusterChanges,
+      Map<String, ResourceAssignment> currentAssignment,
+      Map<String, Set<AssignableReplica>> allocatedReplicas) {
+    Set<AssignableReplica> toBeAssignedReplicas = new HashSet<>();
+
+    // A newly connected node = A new LiveInstance znode (or session Id updated) & the
+    // corresponding instance is live.
+    // TODO: The assumption here is that if the LiveInstance znode is created or it's session Id is
+    // TODO: updated, we need to call algorithm for moving some partitions to this new node.
+    // TODO: However, if the liveInstance znode is changed because of some other reason, it will be
+    // TODO: treated as a newly connected nodes. We need to find a better way to identify which one
+    // TODO: is the real newly connected nodes.
+    Set<String> newlyConnectedNodes = clusterChanges
+        .getOrDefault(HelixConstants.ChangeType.LIVE_INSTANCE, Collections.emptySet());
+    newlyConnectedNodes.retainAll(liveInstances);
+    if (clusterChanges.containsKey(HelixConstants.ChangeType.CLUSTER_CONFIG) || clusterChanges
+        .containsKey(HelixConstants.ChangeType.INSTANCE_CONFIG) || !newlyConnectedNodes.isEmpty()) {
+      // 1. If the cluster topology has been modified, need to reassign all replicas.
+      // 2. If any node was newly connected, need to rebalance all replicas for the evenness of
+      // distribution.
+      toBeAssignedReplicas
+          .addAll(replicaMap.values().stream().flatMap(Set::stream).collect(Collectors.toSet()));
+    } else {
+      // check each resource to identify the allocated replicas and to-be-assigned replicas.
+      for (Map.Entry<String, Set<AssignableReplica>> replicaMapEntry : replicaMap.entrySet()) {
+        String resourceName = replicaMapEntry.getKey();
+        Set<AssignableReplica> replicas = replicaMapEntry.getValue();
+        // 1. if the resource config/idealstate is changed, need to reassign.
+        // 2. if the resource does not appear in the current assignment, need to reassign.
+        if (clusterChanges
+            .getOrDefault(HelixConstants.ChangeType.RESOURCE_CONFIG, Collections.emptySet())
+            .contains(resourceName) || clusterChanges
+            .getOrDefault(HelixConstants.ChangeType.IDEAL_STATE, Collections.emptySet())
+            .contains(resourceName) || !currentAssignment.containsKey(resourceName)) {
+          toBeAssignedReplicas.addAll(replicas);
+          continue; // go to check next resource
+        } else {
+          // check for every replica assignment to identify if the related replicas need to be reassigned.
+          // <partition, <state, instances list>>
+          Map<String, Map<String, Set<String>>> stateMap =
+              getValidStateInstanceMap(currentAssignment.get(resourceName), activeInstances);
+          for (AssignableReplica replica : replicas) {
+            // Find any ACTIVE instance allocation that has the same state with the replica
+            Set<String> validInstances =
+                stateMap.getOrDefault(replica.getPartitionName(), Collections.emptyMap())
+                    .getOrDefault(replica.getReplicaState(), Collections.emptySet());
+            if (validInstances.isEmpty()) {
+              // 3. if no such an instance in the current assignment, need to reassign the replica
+              toBeAssignedReplicas.add(replica);
+              continue; // go to check the next replica
+            } else {
+              Iterator<String> iter = validInstances.iterator();
+              // Remove the instance from the current allocation record after processing so that it
+              // won't be double-processed as we loop through all replicas
+              String instanceName = iter.next();
+              iter.remove();
+              // the current assignment for this replica is valid,
+              // add to the allocated replica list.
+              allocatedReplicas.computeIfAbsent(instanceName, key -> new HashSet<>()).add(replica);
+            }
+          }
+        }
+      }
+    }
+    return toBeAssignedReplicas;
+  }
+
+  /**
+   * Filter to remove all invalid allocations that are not on the active instances.
+   * @param assignment
+   * @param activeInstances
+   * @return A map of <partition, <state, instances set>> contains the valid state to instance map.
+   */
+  private static Map<String, Map<String, Set<String>>> getValidStateInstanceMap(
+      ResourceAssignment assignment, Set<String> activeInstances) {
+    Map<String, Map<String, Set<String>>> stateInstanceMap = getStateInstanceMap(assignment);
+    stateInstanceMap.values().stream().forEach(stateMap -> stateMap.values().stream()
+        .forEach(instanceSet -> instanceSet.retainAll(activeInstances)));
+    return stateInstanceMap;
+  }
+
+  // <partition, <state, instances set>>
+  private static Map<String, Map<String, Set<String>>> getStateInstanceMap(
+      ResourceAssignment assignment) {
+    if (assignment == null) {
+      return Collections.emptyMap();
+    }
+    return assignment.getMappedPartitions().stream()
+        .collect(Collectors.toMap(partition -> partition.getPartitionName(), partition -> {
+          Map<String, Set<String>> stateInstanceMap = new HashMap<>();
+          assignment.getReplicaMap(partition).entrySet().stream().forEach(
+              stateMapEntry -> stateInstanceMap
+                  .computeIfAbsent(stateMapEntry.getValue(), key -> new HashSet<>())
+                  .add(stateMapEntry.getKey()));
+          return stateInstanceMap;
+        }));
+  }
+
+  /**
+   * Get all the nodes that can be assigned replicas based on the configurations.
+   *
+   * @param clusterConfig     The cluster configuration.
+   * @param instanceConfigMap A map of all the instance configuration.
+   *                          If any active instance has no configuration, it will be ignored.
+   * @param activeInstances   All the instances that are online and enabled.
+   * @return A map of assignable node set, <InstanceName, node set>.
+   */
+  private static Set<AssignableNode> getAllAssignableNodes(ClusterConfig clusterConfig,
+      Map<String, InstanceConfig> instanceConfigMap, Set<String> activeInstances) {
+    return activeInstances.parallelStream()
+        .filter(instance -> instanceConfigMap.containsKey(instance)).map(
+            instanceName -> new AssignableNode(clusterConfig, instanceConfigMap.get(instanceName),
+                instanceName)).collect(Collectors.toSet());
+  }
+
+  /**
+   * Get all the replicas that need to be reallocated from the cluster data cache.
+   *
+   * @param dataProvider The cluster status cache that contains the current cluster status.
+   * @param resourceMap  All the valid resources that are managed by the rebalancer.
+   * @param assignableNodes All the active assignable nodes.
+   * @return A map of assignable replica set, <ResourceName, replica set>.
+   */
+  private static Map<String, Set<AssignableReplica>> getAllAssignableReplicas(
+      ResourceControllerDataProvider dataProvider, Map<String, Resource> resourceMap,
+      Set<AssignableNode> assignableNodes) {
+    ClusterConfig clusterConfig = dataProvider.getClusterConfig();
+    int activeFaultZoneCount = assignableNodes.stream().map(node -> node.getFaultZone())
+        .collect(Collectors.toSet()).size();
+    return resourceMap.keySet().parallelStream().map(resourceName -> {
+      ResourceConfig resourceConfig = dataProvider.getResourceConfig(resourceName);
+      if (resourceConfig == null) {
+        resourceConfig = new ResourceConfig(resourceName);
+      }
+      IdealState is = dataProvider.getIdealState(resourceName);
+      if (is == null) {
+        throw new HelixException(
+            "Cannot find the resource ideal state for resource: " + resourceName);
+      }
+      String defName = is.getStateModelDefRef();
+      StateModelDefinition def = dataProvider.getStateModelDef(defName);
+      if (def == null) {
+        throw new IllegalArgumentException(String
+            .format("Cannot find state model definition %s for resource %s.",
+                is.getStateModelDefRef(), resourceName));
+      }
+      Map<String, Integer> stateCountMap =
+          def.getStateCountMap(activeFaultZoneCount, is.getReplicaCount(assignableNodes.size()));
+      mergeIdealStateWithResourceConfig(resourceConfig, is);
+      Set<AssignableReplica> replicas = new HashSet<>();
+      for (String partition : is.getPartitionSet()) {
+        for (Map.Entry<String, Integer> entry : stateCountMap.entrySet()) {
+          String state = entry.getKey();
+          for (int i = 0; i < entry.getValue(); i++) {
+            replicas.add(new AssignableReplica(clusterConfig, resourceConfig, partition, state,
+                def.getStatePriorityMap().get(state)));
+          }
+        }
+      }
+      return new HashMap.SimpleEntry<>(resourceName, replicas);
+    }).collect(Collectors.toMap(entry -> entry.getKey(), entry -> entry.getValue()));
+  }
+
+  /**
+   * For backward compatibility, propagate the critical simple fields from the IdealState to
+   * the Resource Config.
+   * Eventually, Resource Config should be the only metadata node that contains the required information.
+   */
+  private static void mergeIdealStateWithResourceConfig(ResourceConfig resourceConfig,
+      final IdealState idealState) {
+    // Note that the config fields get updated in this method shall be fully compatible with ones in the IdealState.
+    // 1. The fields shall have exactly the same meaning.
+    // 2. The value shall be exactly compatible, no additional calculation involved.
+    // 3. Resource Config items have a high priority.
+    // This is to ensure the resource config is not polluted after the merge.
+    if (null == resourceConfig.getRecord()
+        .getSimpleField(ResourceConfig.ResourceConfigProperty.INSTANCE_GROUP_TAG.name())) {
+      resourceConfig.getRecord()
+          .setSimpleField(ResourceConfig.ResourceConfigProperty.INSTANCE_GROUP_TAG.name(),
+              idealState.getInstanceGroupTag());
+    }
+    if (null == resourceConfig.getRecord()
+        .getSimpleField(ResourceConfig.ResourceConfigProperty.MAX_PARTITIONS_PER_INSTANCE.name())) {
+      resourceConfig.getRecord()
+          .setIntField(ResourceConfig.ResourceConfigProperty.MAX_PARTITIONS_PER_INSTANCE.name(),
+              idealState.getMaxPartitionsPerInstance());
+    }
+  }
+
+  /**
+   * @return A map containing the assignments for each fault zone. <fault zone, <resource, set of partitions>>
+   */
+  private static Map<String, Map<String, Set<String>>> mapAssignmentToFaultZone(
+      Set<AssignableNode> assignableNodes) {
+    Map<String, Map<String, Set<String>>> faultZoneAssignmentMap = new HashMap<>();
+    assignableNodes.stream().forEach(node -> {
+      for (Map.Entry<String, Set<String>> resourceMap : node.getAssignedPartitionsMap()
+          .entrySet()) {
+        faultZoneAssignmentMap.computeIfAbsent(node.getFaultZone(), k -> new HashMap<>())
+            .computeIfAbsent(resourceMap.getKey(), k -> new HashSet<>())
+            .addAll(resourceMap.getValue());
+      }
+    });
+    return faultZoneAssignmentMap;
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/OptimalAssignment.java b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/OptimalAssignment.java
new file mode 100644
index 0000000..1ff00c9
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/OptimalAssignment.java
@@ -0,0 +1,93 @@
+package org.apache.helix.controller.rebalancer.waged.model;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.helix.HelixException;
+import org.apache.helix.model.Partition;
+import org.apache.helix.model.ResourceAssignment;
+
+/**
+ * The data model represents the optimal assignment of N replicas assigned to M instances;
+ * It's mostly used as the return parameter of an assignment calculation algorithm; If the algorithm
+ * failed to find optimal assignment given the endeavor, the user could check the failure reasons.
+ * Note that this class is not thread safe.
+ */
+public class OptimalAssignment {
+  private Map<String, ResourceAssignment> _optimalAssignment = Collections.emptyMap();
+  private Map<AssignableReplica, Map<AssignableNode, List<String>>> _failedAssignments =
+      new HashMap<>();
+
+  /**
+   * Update the OptimalAssignment instance with the existing assignment recorded in the input cluster model.
+   *
+   * @param clusterModel
+   */
+  public void updateAssignments(ClusterModel clusterModel) {
+    Map<String, ResourceAssignment> assignmentMap = new HashMap<>();
+    for (AssignableNode node : clusterModel.getAssignableNodes().values()) {
+      for (AssignableReplica replica : node.getAssignedReplicas()) {
+        String resourceName = replica.getResourceName();
+        Partition partition = new Partition(replica.getPartitionName());
+        ResourceAssignment resourceAssignment = assignmentMap
+            .computeIfAbsent(resourceName, key -> new ResourceAssignment(resourceName));
+        Map<String, String> partitionStateMap = resourceAssignment.getReplicaMap(partition);
+        if (partitionStateMap.isEmpty()) {
+          // ResourceAssignment returns immutable empty map while no such assignment recorded yet.
+          // So if the returned map is empty, create a new map.
+          partitionStateMap = new HashMap<>();
+        }
+        partitionStateMap.put(node.getInstanceName(), replica.getReplicaState());
+        resourceAssignment.addReplicaMap(partition, partitionStateMap);
+      }
+    }
+    _optimalAssignment = assignmentMap;
+  }
+
+  /**
+   * @return The optimal assignment in the form of a <Resource Name, ResourceAssignment> map.
+   */
+  public Map<String, ResourceAssignment> getOptimalResourceAssignment() {
+    if (hasAnyFailure()) {
+      throw new HelixException(
+          "Cannot get the optimal resource assignment since a calculation failure is recorded. "
+              + getFailures());
+    }
+    return _optimalAssignment;
+  }
+
+  public void recordAssignmentFailure(AssignableReplica replica,
+      Map<AssignableNode, List<String>> failedReasons) {
+    _failedAssignments.put(replica, failedReasons);
+  }
+
+  public boolean hasAnyFailure() {
+    return !_failedAssignments.isEmpty();
+  }
+
+  public String getFailures() {
+    // TODO: format the error string
+    return _failedAssignments.toString();
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java b/helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java
index a2b63f8..b570568 100644
--- a/helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java
+++ b/helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java
@@ -38,5 +38,6 @@ public enum AttributeName {
   AsyncFIFOWorkerPool,
   PipelineType,
   LastRebalanceFinishTimeStamp,
-  ControllerDataProvider
+  ControllerDataProvider,
+  STATEFUL_REBALANCER
 }
diff --git a/helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java b/helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
index 49a72e0..ffaac8f 100644
--- a/helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
+++ b/helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
@@ -20,13 +20,17 @@ package org.apache.helix.controller.stages;
  */
 
 import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
 import java.util.Iterator;
 import java.util.List;
 import java.util.Map;
 import java.util.concurrent.Callable;
+import java.util.stream.Collectors;
 
 import org.apache.helix.HelixException;
 import org.apache.helix.HelixManager;
+import org.apache.helix.HelixRebalanceException;
 import org.apache.helix.controller.LogUtil;
 import org.apache.helix.controller.dataproviders.ResourceControllerDataProvider;
 import org.apache.helix.controller.pipeline.AbstractBaseStage;
@@ -37,6 +41,8 @@ import org.apache.helix.controller.rebalancer.MaintenanceRebalancer;
 import org.apache.helix.controller.rebalancer.Rebalancer;
 import org.apache.helix.controller.rebalancer.SemiAutoRebalancer;
 import org.apache.helix.controller.rebalancer.internal.MappingCalculator;
+import org.apache.helix.controller.rebalancer.waged.WagedRebalancer;
+import org.apache.helix.model.ClusterConfig;
 import org.apache.helix.model.IdealState;
 import org.apache.helix.model.InstanceConfig;
 import org.apache.helix.model.MaintenanceSignal;
@@ -56,18 +62,19 @@ import org.slf4j.LoggerFactory;
  * IdealState,StateModel,LiveInstance
  */
 public class BestPossibleStateCalcStage extends AbstractBaseStage {
-  private static final Logger logger = LoggerFactory.getLogger(BestPossibleStateCalcStage.class.getName());
+  private static final Logger logger =
+      LoggerFactory.getLogger(BestPossibleStateCalcStage.class.getName());
 
   @Override
   public void process(ClusterEvent event) throws Exception {
     _eventId = event.getEventId();
-    CurrentStateOutput currentStateOutput =
-        event.getAttribute(AttributeName.CURRENT_STATE.name());
+    CurrentStateOutput currentStateOutput = event.getAttribute(AttributeName.CURRENT_STATE.name());
     final Map<String, Resource> resourceMap =
         event.getAttribute(AttributeName.RESOURCES_TO_REBALANCE.name());
     final ClusterStatusMonitor clusterStatusMonitor =
         event.getAttribute(AttributeName.clusterStatusMonitor.name());
-    ResourceControllerDataProvider cache = event.getAttribute(AttributeName.ControllerDataProvider.name());
+    ResourceControllerDataProvider cache =
+        event.getAttribute(AttributeName.ControllerDataProvider.name());
 
     if (currentStateOutput == null || resourceMap == null || cache == null) {
       throw new StageException(
@@ -90,8 +97,7 @@ public class BestPossibleStateCalcStage extends AbstractBaseStage {
                     resourceMap, stateModelDefMap);
           }
         } catch (Exception e) {
-          LogUtil
-              .logError(logger, _eventId, "Could not update cluster status metrics!", e);
+          LogUtil.logError(logger, _eventId, "Could not update cluster status metrics!", e);
         }
         return null;
       }
@@ -100,43 +106,57 @@ public class BestPossibleStateCalcStage extends AbstractBaseStage {
 
   private BestPossibleStateOutput compute(ClusterEvent event, Map<String, Resource> resourceMap,
       CurrentStateOutput currentStateOutput) {
-    ResourceControllerDataProvider cache = event.getAttribute(AttributeName.ControllerDataProvider.name());
+    ResourceControllerDataProvider cache =
+        event.getAttribute(AttributeName.ControllerDataProvider.name());
     BestPossibleStateOutput output = new BestPossibleStateOutput();
 
     HelixManager helixManager = event.getAttribute(AttributeName.helixmanager.name());
     ClusterStatusMonitor clusterStatusMonitor =
         event.getAttribute(AttributeName.clusterStatusMonitor.name());
+    WagedRebalancer wagedRebalancer = event.getAttribute(AttributeName.STATEFUL_REBALANCER.name());
 
     // Check whether the offline/disabled instance count in the cluster reaches the set limit,
     // if yes, pause the rebalancer.
-    boolean isValid = validateOfflineInstancesLimit(cache,
-        (HelixManager) event.getAttribute(AttributeName.helixmanager.name()));
+    boolean isValid =
+        validateOfflineInstancesLimit(cache, event.getAttribute(AttributeName.helixmanager.name()));
 
     final List<String> failureResources = new ArrayList<>();
-    Iterator<Resource> itr = resourceMap.values().iterator();
+
+    Map<String, Resource> calculatedResourceMap =
+        computeResourceBestPossibleStateWithWagedRebalancer(wagedRebalancer, cache,
+            currentStateOutput, resourceMap, output, failureResources);
+
+    Map<String, Resource> remainingResourceMap = new HashMap<>(resourceMap);
+    remainingResourceMap.keySet().removeAll(calculatedResourceMap.keySet());
+
+    // Fallback to the original single resource rebalancer calculation.
+    // This is required because we support mixed cluster that uses both WAGED rebalancer and the
+    // older rebalancers.
+    Iterator<Resource> itr = remainingResourceMap.values().iterator();
     while (itr.hasNext()) {
       Resource resource = itr.next();
       boolean result = false;
       try {
-        result =
-            computeResourceBestPossibleState(event, cache, currentStateOutput, resource, output);
+        result = computeSingleResourceBestPossibleState(event, cache, currentStateOutput, resource,
+            output);
       } catch (HelixException ex) {
-        LogUtil.logError(logger, _eventId,
-            "Exception when calculating best possible states for " + resource.getResourceName(),
-            ex);
+        LogUtil.logError(logger, _eventId, String
+            .format("Exception when calculating best possible states for %s",
+                resource.getResourceName()), ex);
 
       }
       if (!result) {
         failureResources.add(resource.getResourceName());
-        LogUtil.logWarn(logger, _eventId,
-            "Failed to calculate best possible states for " + resource.getResourceName());
+        LogUtil.logWarn(logger, _eventId, String
+            .format("Failed to calculate best possible states for %s", resource.getResourceName()));
       }
     }
 
     // Check and report if resource rebalance has failure
     updateRebalanceStatus(!isValid || !failureResources.isEmpty(), failureResources, helixManager,
-        cache, clusterStatusMonitor,
-        "Failed to calculate best possible states for " + failureResources.size() + " resources.");
+        cache, clusterStatusMonitor, String
+            .format("Failed to calculate best possible states for %d resources.",
+                failureResources.size()));
 
     return output;
   }
@@ -185,8 +205,9 @@ public class BestPossibleStateCalcStage extends AbstractBaseStage {
         if (manager != null) {
           if (manager.getHelixDataAccessor()
               .getProperty(manager.getHelixDataAccessor().keyBuilder().maintenance()) == null) {
-            manager.getClusterManagmentTool().autoEnableMaintenanceMode(manager.getClusterName(),
-                true, errMsg, MaintenanceSignal.AutoTriggerReason.MAX_OFFLINE_INSTANCES_EXCEEDED);
+            manager.getClusterManagmentTool()
+                .autoEnableMaintenanceMode(manager.getClusterName(), true, errMsg,
+                    MaintenanceSignal.AutoTriggerReason.MAX_OFFLINE_INSTANCES_EXCEEDED);
             LogUtil.logWarn(logger, _eventId, errMsg);
           }
         } else {
@@ -199,8 +220,98 @@ public class BestPossibleStateCalcStage extends AbstractBaseStage {
     return true;
   }
 
-  private boolean computeResourceBestPossibleState(ClusterEvent event, ResourceControllerDataProvider cache,
-      CurrentStateOutput currentStateOutput, Resource resource, BestPossibleStateOutput output) {
+  private void updateWagedRebalancer(WagedRebalancer wagedRebalancer, ClusterConfig clusterConfig) {
+    if (clusterConfig != null) {
+      // Since the rebalance configuration can be updated at runtime, try to update the rebalancer
+      // before calculating.
+      wagedRebalancer.updateRebalancePreference(clusterConfig.getGlobalRebalancePreference());
+      wagedRebalancer
+          .setGlobalRebalanceAsyncMode(clusterConfig.isGlobalRebalanceAsyncModeEnabled());
+    }
+  }
+
+  /**
+   * Rebalance with the WAGED rebalancer
+   * The rebalancer only calculates the new ideal assignment for all the resources that are
+   * configured to use the WAGED rebalancer.
+   *
+   * @param wagedRebalancer    The WAGED rebalancer instance.
+   * @param cache              Cluster data cache.
+   * @param currentStateOutput The current state information.
+   * @param resourceMap        The complete resource map. The method will filter the map for the compatible resources.
+   * @param output             The best possible state output.
+   * @param failureResources   The failure records that will be updated if any resource cannot be computed.
+   * @return The map of all the calculated resources.
+   */
+  private Map<String, Resource> computeResourceBestPossibleStateWithWagedRebalancer(
+      WagedRebalancer wagedRebalancer, ResourceControllerDataProvider cache,
+      CurrentStateOutput currentStateOutput, Map<String, Resource> resourceMap,
+      BestPossibleStateOutput output, List<String> failureResources) {
+    if (cache.isMaintenanceModeEnabled()) {
+      // The WAGED rebalancer won't be used while maintenance mode is enabled.
+      return Collections.emptyMap();
+    }
+
+    // Find the compatible resources: 1. FULL_AUTO 2. Configured to use the WAGED rebalancer
+    Map<String, Resource> wagedRebalancedResourceMap =
+        resourceMap.entrySet().stream().filter(resourceEntry -> {
+          IdealState is = cache.getIdealState(resourceEntry.getKey());
+          return is != null && is.getRebalanceMode().equals(IdealState.RebalanceMode.FULL_AUTO)
+              && WagedRebalancer.class.getName().equals(is.getRebalancerClassName());
+        }).collect(Collectors.toMap(resourceEntry -> resourceEntry.getKey(),
+            resourceEntry -> resourceEntry.getValue()));
+
+    Map<String, IdealState> newIdealStates = new HashMap<>();
+
+    if (wagedRebalancer != null) {
+      updateWagedRebalancer(wagedRebalancer, cache.getClusterConfig());
+      try {
+        newIdealStates.putAll(wagedRebalancer
+            .computeNewIdealStates(cache, wagedRebalancedResourceMap, currentStateOutput));
+      } catch (HelixRebalanceException ex) {
+        // Note that unlike the legacy rebalancer, the WAGED rebalance won't return partial result.
+        // Since it calculates for all the eligible resources globally, a partial result is invalid.
+        // TODO propagate the rebalancer failure information to updateRebalanceStatus for monitoring.
+        LogUtil.logError(logger, _eventId, String
+            .format("Failed to calculate the new Ideal States using the rebalancer %s due to %s",
+                wagedRebalancer.getClass().getSimpleName(), ex.getFailureType()), ex);
+      }
+    } else {
+      LogUtil.logError(logger, _eventId,
+          "Skip rebalancing using the WAGED rebalancer since it is not configured in the rebalance pipeline.");
+    }
+
+    Iterator<Resource> itr = wagedRebalancedResourceMap.values().iterator();
+    while (itr.hasNext()) {
+      Resource resource = itr.next();
+      IdealState is = newIdealStates.get(resource.getResourceName());
+      // Check if the WAGED rebalancer has calculated the result for this resource or not.
+      if (is != null && checkBestPossibleStateCalculation(is)) {
+        // The WAGED rebalancer calculates a valid result, record in the output
+        updateBestPossibleStateOutput(output, resource, is);
+      } else {
+        failureResources.add(resource.getResourceName());
+        LogUtil.logWarn(logger, _eventId, String
+            .format("Failed to calculate best possible states for %s.",
+                resource.getResourceName()));
+      }
+    }
+    return wagedRebalancedResourceMap;
+  }
+
+  private void updateBestPossibleStateOutput(BestPossibleStateOutput output, Resource resource,
+      IdealState computedIdealState) {
+    output.setPreferenceLists(resource.getResourceName(), computedIdealState.getPreferenceLists());
+    for (Partition partition : resource.getPartitions()) {
+      Map<String, String> newStateMap =
+          computedIdealState.getInstanceStateMap(partition.getPartitionName());
+      output.setState(resource.getResourceName(), partition, newStateMap);
+    }
+  }
+
+  private boolean computeSingleResourceBestPossibleState(ClusterEvent event,
+      ResourceControllerDataProvider cache, CurrentStateOutput currentStateOutput,
+      Resource resource, BestPossibleStateOutput output) {
     // for each ideal state
     // read the state model def
     // for each resource
@@ -229,12 +340,13 @@ public class BestPossibleStateCalcStage extends AbstractBaseStage {
 
     Rebalancer<ResourceControllerDataProvider> rebalancer =
         getRebalancer(idealState, resourceName, cache.isMaintenanceModeEnabled());
-    MappingCalculator<ResourceControllerDataProvider> mappingCalculator = getMappingCalculator(rebalancer, resourceName);
+    MappingCalculator<ResourceControllerDataProvider> mappingCalculator =
+        getMappingCalculator(rebalancer, resourceName);
 
     if (rebalancer == null || mappingCalculator == null) {
-      LogUtil.logError(logger, _eventId,
-          "Error computing assignment for resource " + resourceName + ". no rebalancer found. rebalancer: " + rebalancer
-              + " mappingCalculator: " + mappingCalculator);
+      LogUtil.logError(logger, _eventId, "Error computing assignment for resource " + resourceName
+          + ". no rebalancer found. rebalancer: " + rebalancer + " mappingCalculator: "
+          + mappingCalculator);
     }
 
     if (rebalancer != null && mappingCalculator != null) {
@@ -299,10 +411,9 @@ public class BestPossibleStateCalcStage extends AbstractBaseStage {
     }
   }
 
-  private Rebalancer<ResourceControllerDataProvider> getRebalancer(IdealState idealState, String resourceName,
-      boolean isMaintenanceModeEnabled) {
+  private Rebalancer<ResourceControllerDataProvider> getCustomizedRebalancer(
+      String rebalancerClassName, String resourceName) {
     Rebalancer<ResourceControllerDataProvider> customizedRebalancer = null;
-    String rebalancerClassName = idealState.getRebalancerClassName();
     if (rebalancerClassName != null) {
       if (logger.isDebugEnabled()) {
         LogUtil.logDebug(logger, _eventId,
@@ -316,13 +427,19 @@ public class BestPossibleStateCalcStage extends AbstractBaseStage {
             "Exception while invoking custom rebalancer class:" + rebalancerClassName, e);
       }
     }
+    return customizedRebalancer;
+  }
 
+  private Rebalancer<ResourceControllerDataProvider> getRebalancer(IdealState idealState,
+      String resourceName, boolean isMaintenanceModeEnabled) {
     Rebalancer<ResourceControllerDataProvider> rebalancer = null;
     switch (idealState.getRebalanceMode()) {
     case FULL_AUTO:
       if (isMaintenanceModeEnabled) {
         rebalancer = new MaintenanceRebalancer();
       } else {
+        Rebalancer<ResourceControllerDataProvider> customizedRebalancer =
+            getCustomizedRebalancer(idealState.getRebalancerClassName(), resourceName);
         if (customizedRebalancer != null) {
           rebalancer = customizedRebalancer;
         } else {
@@ -338,14 +455,13 @@ public class BestPossibleStateCalcStage extends AbstractBaseStage {
       break;
     case USER_DEFINED:
     case TASK:
-      rebalancer = customizedRebalancer;
+      rebalancer = getCustomizedRebalancer(idealState.getRebalancerClassName(), resourceName);
       break;
     default:
       LogUtil.logError(logger, _eventId,
           "Fail to find the rebalancer, invalid rebalance mode " + idealState.getRebalanceMode());
       break;
     }
-
     return rebalancer;
   }
 
diff --git a/helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateComputationStage.java b/helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateComputationStage.java
index 66da8ba..62fda33 100644
--- a/helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateComputationStage.java
+++ b/helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateComputationStage.java
@@ -20,19 +20,31 @@ package org.apache.helix.controller.stages;
  */
 
 import java.util.Collection;
+import java.util.Collections;
 import java.util.List;
 import java.util.Map;
+import java.util.concurrent.ExecutorService;
+import java.util.stream.Collectors;
 
 import org.apache.helix.controller.LogUtil;
 import org.apache.helix.controller.dataproviders.BaseControllerDataProvider;
+import org.apache.helix.controller.dataproviders.ResourceControllerDataProvider;
 import org.apache.helix.controller.pipeline.AbstractBaseStage;
 import org.apache.helix.controller.pipeline.StageException;
+import org.apache.helix.controller.rebalancer.util.ResourceUsageCalculator;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterModel;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterModelProvider;
 import org.apache.helix.model.CurrentState;
+import org.apache.helix.model.IdealState;
 import org.apache.helix.model.LiveInstance;
 import org.apache.helix.model.Message;
 import org.apache.helix.model.Message.MessageType;
 import org.apache.helix.model.Partition;
 import org.apache.helix.model.Resource;
+import org.apache.helix.model.ResourceAssignment;
+import org.apache.helix.model.ResourceConfig;
+import org.apache.helix.monitoring.mbeans.ClusterStatusMonitor;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -50,6 +62,8 @@ public class CurrentStateComputationStage extends AbstractBaseStage {
     _eventId = event.getEventId();
     BaseControllerDataProvider cache = event.getAttribute(AttributeName.ControllerDataProvider.name());
     final Map<String, Resource> resourceMap = event.getAttribute(AttributeName.RESOURCES.name());
+    final Map<String, Resource> resourceToRebalance =
+        event.getAttribute(AttributeName.RESOURCES_TO_REBALANCE.name());
 
     if (cache == null || resourceMap == null) {
       throw new StageException("Missing attributes in event:" + event
@@ -74,6 +88,16 @@ public class CurrentStateComputationStage extends AbstractBaseStage {
       updateCurrentStates(instance, currentStateMap.values(), currentStateOutput, resourceMap);
     }
     event.addAttribute(AttributeName.CURRENT_STATE.name(), currentStateOutput);
+
+    final ClusterStatusMonitor clusterStatusMonitor =
+        event.getAttribute(AttributeName.clusterStatusMonitor.name());
+    if (clusterStatusMonitor != null && cache instanceof ResourceControllerDataProvider) {
+      final ResourceControllerDataProvider dataProvider = (ResourceControllerDataProvider) cache;
+      reportInstanceCapacityMetrics(clusterStatusMonitor, dataProvider, resourceToRebalance,
+          currentStateOutput);
+      reportResourcePartitionCapacityMetrics(dataProvider.getAsyncTasksThreadPool(),
+          clusterStatusMonitor, dataProvider.getResourceConfigMap().values());
+    }
   }
 
   // update all pending messages to CurrentStateOutput.
@@ -220,4 +244,55 @@ public class CurrentStateComputationStage extends AbstractBaseStage {
       currentStateOutput.setCancellationMessage(resourceName, partition, instanceName, message);
     }
   }
+
+  private void reportInstanceCapacityMetrics(ClusterStatusMonitor clusterStatusMonitor,
+      ResourceControllerDataProvider dataProvider, Map<String, Resource> resourceMap,
+      CurrentStateOutput currentStateOutput) {
+    asyncExecute(dataProvider.getAsyncTasksThreadPool(), () -> {
+      try {
+        // ResourceToRebalance map also has resources from current states.
+        // Only use the resources in ideal states to parse all replicas.
+        Map<String, IdealState> idealStateMap = dataProvider.getIdealStates();
+        Map<String, Resource> resourceToMonitorMap = resourceMap.entrySet().stream()
+            .filter(idealStateMap::containsKey)
+            .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
+
+        Map<String, ResourceAssignment> currentStateAssignment =
+            currentStateOutput.getAssignment(resourceToMonitorMap.keySet());
+        ClusterModel clusterModel = ClusterModelProvider.generateClusterModelFromExistingAssignment(
+            dataProvider, resourceToMonitorMap, currentStateAssignment);
+
+        for (AssignableNode node : clusterModel.getAssignableNodes().values()) {
+          String instanceName = node.getInstanceName();
+          // There is no new usage adding to this node, so an empty map is passed in.
+          double usage = node.getProjectedHighestUtilization(Collections.emptyMap());
+          clusterStatusMonitor
+              .updateInstanceCapacityStatus(instanceName, usage, node.getMaxCapacity());
+        }
+      } catch (Exception ex) {
+        LOG.error("Failed to report instance capacity metrics. Exception message: {}",
+            ex.getMessage());
+      }
+
+      return null;
+    });
+  }
+
+  private void reportResourcePartitionCapacityMetrics(ExecutorService executorService,
+      ClusterStatusMonitor clusterStatusMonitor, Collection<ResourceConfig> resourceConfigs) {
+    asyncExecute(executorService, () -> {
+      try {
+        for (ResourceConfig config : resourceConfigs) {
+          Map<String, Integer> averageWeight = ResourceUsageCalculator
+              .calculateAveragePartitionWeight(config.getPartitionCapacityMap());
+          clusterStatusMonitor.updatePartitionWeight(config.getResourceName(), averageWeight);
+        }
+      } catch (Exception ex) {
+        LOG.error("Failed to report resource partition capacity metrics. Exception message: {}",
+            ex.getMessage());
+      }
+
+      return null;
+    });
+  }
 }
diff --git a/helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateOutput.java b/helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateOutput.java
index bbbf0fd..752a760 100644
--- a/helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateOutput.java
+++ b/helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateOutput.java
@@ -28,6 +28,7 @@ import com.google.common.collect.Sets;
 import org.apache.helix.model.CurrentState;
 import org.apache.helix.model.Message;
 import org.apache.helix.model.Partition;
+import org.apache.helix.model.ResourceAssignment;
 
 /**
  * The current state includes both current state and pending messages
@@ -428,4 +429,26 @@ public class CurrentStateOutput {
     return sb.toString();
   }
 
+  /**
+   * Get current state assignment for a set of resources.
+   * @param resourceSet a set of resources' names
+   * @return a map of current state resource assignment, {resourceName: resourceAssignment}
+   */
+  public Map<String, ResourceAssignment> getAssignment(Set<String> resourceSet) {
+    Map<String, ResourceAssignment> currentStateAssignment = new HashMap<>();
+    for (String resourceName : resourceSet) {
+      Map<Partition, Map<String, String>> currentStateMap =
+          getCurrentStateMap(resourceName);
+      if (!currentStateMap.isEmpty()) {
+        ResourceAssignment newResourceAssignment = new ResourceAssignment(resourceName);
+        currentStateMap.entrySet().stream().forEach(currentStateEntry -> {
+          newResourceAssignment.addReplicaMap(currentStateEntry.getKey(),
+              currentStateEntry.getValue());
+        });
+        currentStateAssignment.put(resourceName, newResourceAssignment);
+      }
+    }
+
+    return currentStateAssignment;
+  }
 }
diff --git a/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java b/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
index 0a978e5..61e75b3 100644
--- a/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
+++ b/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
@@ -57,6 +57,10 @@ import org.apache.helix.ZNRecord;
 import org.apache.helix.controller.rebalancer.DelayedAutoRebalancer;
 import org.apache.helix.controller.rebalancer.strategy.CrushEdRebalanceStrategy;
 import org.apache.helix.controller.rebalancer.strategy.RebalanceStrategy;
+import org.apache.helix.controller.rebalancer.util.WagedValidationUtil;
+import org.apache.helix.controller.rebalancer.waged.WagedRebalancer;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
 import org.apache.helix.manager.zk.client.HelixZkClient;
 import org.apache.helix.manager.zk.client.SharedZkClientFactory;
 import org.apache.helix.model.ClusterConfig;
@@ -76,6 +80,7 @@ import org.apache.helix.model.Message;
 import org.apache.helix.model.Message.MessageState;
 import org.apache.helix.model.Message.MessageType;
 import org.apache.helix.model.PauseSignal;
+import org.apache.helix.model.ResourceConfig;
 import org.apache.helix.model.StateModelDefinition;
 import org.apache.helix.tools.DefaultIdealStateCalculator;
 import org.apache.helix.util.HelixUtil;
@@ -84,6 +89,7 @@ import org.apache.zookeeper.KeeperException;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+
 public class ZKHelixAdmin implements HelixAdmin {
   public static final String CONNECTION_TIMEOUT = "helixAdmin.timeOutInSec";
   private static final String MAINTENANCE_ZNODE_ID = "maintenance";
@@ -180,7 +186,7 @@ public class ZKHelixAdmin implements HelixAdmin {
           // does not repeatedly write instance history)
           logger.warn("Retrying dropping instance {} with exception {}",
               instanceConfig.getInstanceName(), e.getCause().getMessage());
-          retryCnt ++;
+          retryCnt++;
         } else {
           logger.error("Failed to drop instance {} (not retryable).",
               instanceConfig.getInstanceName(), e.getCause());
@@ -403,7 +409,8 @@ public class ZKHelixAdmin implements HelixAdmin {
     HelixDataAccessor accessor =
         new ZKHelixDataAccessor(clusterName, new ZkBaseDataAccessor<ZNRecord>(_zkClient));
     Builder keyBuilder = accessor.keyBuilder();
-    return accessor.getBaseDataAccessor().exists(keyBuilder.maintenance().getPath(), AccessOption.PERSISTENT);
+    return accessor.getBaseDataAccessor()
+        .exists(keyBuilder.maintenance().getPath(), AccessOption.PERSISTENT);
   }
 
   @Override
@@ -436,16 +443,16 @@ public class ZKHelixAdmin implements HelixAdmin {
    * @param customFields
    * @param triggeringEntity
    */
-  private void processMaintenanceMode(String clusterName, final boolean enabled, final String reason,
-      final MaintenanceSignal.AutoTriggerReason internalReason, final Map<String, String> customFields,
+  private void processMaintenanceMode(String clusterName, final boolean enabled,
+      final String reason, final MaintenanceSignal.AutoTriggerReason internalReason,
+      final Map<String, String> customFields,
       final MaintenanceSignal.TriggeringEntity triggeringEntity) {
     HelixDataAccessor accessor =
         new ZKHelixDataAccessor(clusterName, new ZkBaseDataAccessor<ZNRecord>(_zkClient));
     Builder keyBuilder = accessor.keyBuilder();
     logger.info("Cluster {} {} {} maintenance mode for reason {}.", clusterName,
         triggeringEntity == MaintenanceSignal.TriggeringEntity.CONTROLLER ? "automatically"
-            : "manually",
-        enabled ? "enters" : "exits", reason == null ? "NULL" : reason);
+            : "manually", enabled ? "enters" : "exits", reason == null ? "NULL" : reason);
     final long currentTime = System.currentTimeMillis();
     if (!enabled) {
       // Exit maintenance mode
@@ -459,23 +466,23 @@ public class ZKHelixAdmin implements HelixAdmin {
       maintenanceSignal.setTimestamp(currentTime);
       maintenanceSignal.setTriggeringEntity(triggeringEntity);
       switch (triggeringEntity) {
-      case CONTROLLER:
-        // autoEnable
-        maintenanceSignal.setAutoTriggerReason(internalReason);
-        break;
-      case USER:
-      case UNKNOWN:
-        // manuallyEnable
-        if (customFields != null && !customFields.isEmpty()) {
-          // Enter all custom fields provided by the user
-          Map<String, String> simpleFields = maintenanceSignal.getRecord().getSimpleFields();
-          for (Map.Entry<String, String> entry : customFields.entrySet()) {
-            if (!simpleFields.containsKey(entry.getKey())) {
-              simpleFields.put(entry.getKey(), entry.getValue());
+        case CONTROLLER:
+          // autoEnable
+          maintenanceSignal.setAutoTriggerReason(internalReason);
+          break;
+        case USER:
+        case UNKNOWN:
+          // manuallyEnable
+          if (customFields != null && !customFields.isEmpty()) {
+            // Enter all custom fields provided by the user
+            Map<String, String> simpleFields = maintenanceSignal.getRecord().getSimpleFields();
+            for (Map.Entry<String, String> entry : customFields.entrySet()) {
+              if (!simpleFields.containsKey(entry.getKey())) {
+                simpleFields.put(entry.getKey(), entry.getValue());
+              }
             }
           }
-        }
-        break;
+          break;
       }
       if (!accessor.createMaintenance(maintenanceSignal)) {
         throw new HelixException("Failed to create maintenance signal!");
@@ -483,16 +490,17 @@ public class ZKHelixAdmin implements HelixAdmin {
     }
 
     // Record a MaintenanceSignal history
-    if (!accessor.getBaseDataAccessor().update(keyBuilder.controllerLeaderHistory().getPath(),
-        new DataUpdater<ZNRecord>() {
+    if (!accessor.getBaseDataAccessor()
+        .update(keyBuilder.controllerLeaderHistory().getPath(), new DataUpdater<ZNRecord>() {
           @Override
           public ZNRecord update(ZNRecord oldRecord) {
             try {
               if (oldRecord == null) {
                 oldRecord = new ZNRecord(PropertyType.HISTORY.toString());
               }
-              return new ControllerHistory(oldRecord).updateMaintenanceHistory(enabled, reason,
-                  currentTime, internalReason, customFields, triggeringEntity);
+              return new ControllerHistory(oldRecord)
+                  .updateMaintenanceHistory(enabled, reason, currentTime, internalReason,
+                      customFields, triggeringEntity);
             } catch (IOException e) {
               logger.error("Failed to update maintenance history! Exception: {}", e);
               return oldRecord;
@@ -1241,7 +1249,8 @@ public class ZKHelixAdmin implements HelixAdmin {
     setResourceIdealState(clusterName, resourceName, new IdealState(idealStateRecord));
   }
 
-  private static byte[] readFile(String filePath) throws IOException {
+  private static byte[] readFile(String filePath)
+      throws IOException {
     File file = new File(filePath);
 
     int size = (int) file.length();
@@ -1264,7 +1273,8 @@ public class ZKHelixAdmin implements HelixAdmin {
 
   @Override
   public void addStateModelDef(String clusterName, String stateModelDefName,
-      String stateModelDefFile) throws IOException {
+      String stateModelDefFile)
+      throws IOException {
     ZNRecord record =
         (ZNRecord) (new ZNRecordSerializer().deserialize(readFile(stateModelDefFile)));
     if (record == null || record.getId() == null || !record.getId().equals(stateModelDefName)) {
@@ -1287,9 +1297,9 @@ public class ZKHelixAdmin implements HelixAdmin {
     baseAccessor.update(path, new DataUpdater<ZNRecord>() {
       @Override
       public ZNRecord update(ZNRecord currentData) {
-        ClusterConstraints constraints = currentData == null ?
-            new ClusterConstraints(constraintType) :
-            new ClusterConstraints(currentData);
+        ClusterConstraints constraints =
+            currentData == null ? new ClusterConstraints(constraintType)
+                : new ClusterConstraints(currentData);
 
         constraints.addConstraintItem(constraintId, constraintItem);
         return constraints.getRecord();
@@ -1495,9 +1505,7 @@ public class ZKHelixAdmin implements HelixAdmin {
           + ", instance config does not exist");
     }
 
-    baseAccessor.update(path, new DataUpdater<ZNRecord>()
-
-    {
+    baseAccessor.update(path, new DataUpdater<ZNRecord>() {
       @Override
       public ZNRecord update(ZNRecord currentData) {
         if (currentData == null) {
@@ -1587,4 +1595,212 @@ public class ZKHelixAdmin implements HelixAdmin {
       _zkClient.close();
     }
   }
+
+  @Override
+  public boolean addResourceWithWeight(String clusterName, IdealState idealState,
+      ResourceConfig resourceConfig) {
+    // Null checks
+    if (clusterName == null || clusterName.isEmpty()) {
+      throw new HelixException("Cluster name is null or empty!");
+    }
+    if (idealState == null || !idealState.isValid()) {
+      throw new HelixException("IdealState is null or invalid!");
+    }
+    if (resourceConfig == null || !resourceConfig.isValid()) {
+      // TODO This might be okay because of default weight?
+      throw new HelixException("ResourceConfig is null or invalid!");
+    }
+
+    // Make sure IdealState and ResourceConfig are for the same resource
+    if (!idealState.getResourceName().equals(resourceConfig.getResourceName())) {
+      throw new HelixException("Resource names in IdealState and ResourceConfig are different!");
+    }
+
+    // Order in which a resource should be added:
+    // 1. Validate the weights in ResourceConfig against ClusterConfig
+    // Check that all capacity keys in ClusterConfig are set up in every partition in ResourceConfig field
+    if (!validateWeightForResourceConfig(_configAccessor.getClusterConfig(clusterName),
+        resourceConfig, idealState)) {
+      throw new HelixException(String
+          .format("Could not add resource %s with weight! Failed to validate the ResourceConfig!",
+              idealState.getResourceName()));
+    }
+
+    // 2. Add the resourceConfig to ZK
+    _configAccessor
+        .setResourceConfig(clusterName, resourceConfig.getResourceName(), resourceConfig);
+
+    // 3. Add the idealState to ZK
+    setResourceIdealState(clusterName, idealState.getResourceName(), idealState);
+
+    // 4. rebalance the resource
+    rebalance(clusterName, idealState.getResourceName(), Integer.parseInt(idealState.getReplicas()),
+        idealState.getResourceName(), idealState.getInstanceGroupTag());
+
+    return true;
+  }
+
+  @Override
+  public boolean enableWagedRebalance(String clusterName, List<String> resourceNames) {
+    // Null checks
+    if (clusterName == null || clusterName.isEmpty()) {
+      throw new HelixException("Cluster name is invalid!");
+    }
+    if (resourceNames == null || resourceNames.isEmpty()) {
+      throw new HelixException("Resource name list is invalid!");
+    }
+
+    HelixDataAccessor accessor =
+        new ZKHelixDataAccessor(clusterName, new ZkBaseDataAccessor<>(_zkClient));
+    Builder keyBuilder = accessor.keyBuilder();
+    List<IdealState> idealStates = accessor.getChildValues(keyBuilder.idealStates());
+    List<String> nullIdealStates = new ArrayList<>();
+    for (int i = 0; i < idealStates.size(); i++) {
+      if (idealStates.get(i) == null) {
+        nullIdealStates.add(resourceNames.get(i));
+      } else {
+        idealStates.get(i).setRebalancerClassName(WagedRebalancer.class.getName());
+        idealStates.get(i).setRebalanceMode(RebalanceMode.FULL_AUTO);
+      }
+    }
+    if (!nullIdealStates.isEmpty()) {
+      throw new HelixException(
+          String.format("Not all IdealStates exist in the cluster: %s", nullIdealStates));
+    }
+    List<PropertyKey> idealStateKeys = new ArrayList<>();
+    idealStates.forEach(
+        idealState -> idealStateKeys.add(keyBuilder.idealStates(idealState.getResourceName())));
+    boolean[] success = accessor.setChildren(idealStateKeys, idealStates);
+    for (boolean s : success) {
+      if (!s) {
+        return false;
+      }
+    }
+    return true;
+  }
+
+  @Override
+  public Map<String, Boolean> validateResourcesForWagedRebalance(String clusterName,
+      List<String> resourceNames) {
+    // Null checks
+    if (clusterName == null || clusterName.isEmpty()) {
+      throw new HelixException("Cluster name is invalid!");
+    }
+    if (resourceNames == null || resourceNames.isEmpty()) {
+      throw new HelixException("Resource name list is invalid!");
+    }
+
+    // Ensure that all instances are valid
+    HelixDataAccessor accessor =
+        new ZKHelixDataAccessor(clusterName, new ZkBaseDataAccessor<>(_zkClient));
+    Builder keyBuilder = accessor.keyBuilder();
+    List<String> instances = accessor.getChildNames(keyBuilder.instanceConfigs());
+    if (validateInstancesForWagedRebalance(clusterName, instances).containsValue(false)) {
+      throw new HelixException(String
+          .format("Instance capacities haven't been configured properly for cluster %s",
+              clusterName));
+    }
+
+    Map<String, Boolean> result = new HashMap<>();
+    ClusterConfig clusterConfig = _configAccessor.getClusterConfig(clusterName);
+    for (String resourceName : resourceNames) {
+      IdealState idealState = getResourceIdealState(clusterName, resourceName);
+      if (idealState == null || !idealState.isValid()) {
+        result.put(resourceName, false);
+        continue;
+      }
+      ResourceConfig resourceConfig = _configAccessor.getResourceConfig(clusterName, resourceName);
+      result.put(resourceName,
+          validateWeightForResourceConfig(clusterConfig, resourceConfig, idealState));
+    }
+    return result;
+  }
+
+  @Override
+  public Map<String, Boolean> validateInstancesForWagedRebalance(String clusterName,
+      List<String> instanceNames) {
+    // Null checks
+    if (clusterName == null || clusterName.isEmpty()) {
+      throw new HelixException("Cluster name is invalid!");
+    }
+    if (instanceNames == null || instanceNames.isEmpty()) {
+      throw new HelixException("Instance name list is invalid!");
+    }
+
+    Map<String, Boolean> result = new HashMap<>();
+    ClusterConfig clusterConfig = _configAccessor.getClusterConfig(clusterName);
+    for (String instanceName : instanceNames) {
+      InstanceConfig instanceConfig = _configAccessor.getInstanceConfig(clusterName, instanceName);
+      if (instanceConfig == null || !instanceConfig.isValid()) {
+        result.put(instanceName, false);
+        continue;
+      }
+      WagedValidationUtil.validateAndGetInstanceCapacity(clusterConfig, instanceConfig);
+      result.put(instanceName, true);
+    }
+
+    return result;
+  }
+
+  /**
+   * Validates ResourceConfig's weight field against the given ClusterConfig.
+   * @param clusterConfig
+   * @param resourceConfig
+   * @param idealState
+   * @return true if ResourceConfig has all the required fields. False otherwise.
+   */
+  private boolean validateWeightForResourceConfig(ClusterConfig clusterConfig,
+      ResourceConfig resourceConfig, IdealState idealState) {
+    if (resourceConfig == null) {
+      if (clusterConfig.getDefaultPartitionWeightMap().isEmpty()) {
+        logger.error(
+            "ResourceConfig for {} is null, and there are no default weights set in ClusterConfig!",
+            idealState.getResourceName());
+        return false;
+      }
+      // If ResourceConfig is null AND the default partition weight map is defined, and the map has all the required keys, we consider this valid since the default weights will be used
+      // Need to check the map contains all the required keys
+      if (clusterConfig.getDefaultPartitionWeightMap().keySet()
+          .containsAll(clusterConfig.getInstanceCapacityKeys())) {
+        // Contains all the required keys, so consider it valid since it will use the default weights
+        return true;
+      }
+      logger.error(
+          "ResourceConfig for {} is null, and ClusterConfig's default partition weight map doesn't have all the required keys!",
+          idealState.getResourceName());
+      return false;
+    }
+
+    // Parse the entire capacityMap from ResourceConfig
+    Map<String, Map<String, Integer>> capacityMap;
+    try {
+      capacityMap = resourceConfig.getPartitionCapacityMap();
+    } catch (IOException ex) {
+      logger.error("Invalid partition capacity configuration of resource: {}",
+          idealState.getResourceName(), ex);
+      return false;
+    }
+
+    Set<String> capacityMapSet = new HashSet<>(capacityMap.keySet());
+    boolean hasDefaultCapacity = capacityMapSet.contains(ResourceConfig.DEFAULT_PARTITION_KEY);
+    // Remove DEFAULT key
+    capacityMapSet.remove(ResourceConfig.DEFAULT_PARTITION_KEY);
+
+    // Make sure capacityMap contains all partitions defined in IdealState
+    // Here, IdealState has not been rebalanced, so listFields might be null, in which case, we would get an emptyList from getPartitionSet()
+    // So check using numPartitions instead
+    // This check allows us to fail early on instead of having to loop through all partitions
+    if (capacityMapSet.size() != idealState.getNumPartitions() && !hasDefaultCapacity) {
+      logger.error(
+          "ResourceConfig for {} does not have all partitions defined in PartitionCapacityMap!",
+          idealState.getResourceName());
+      return false;
+    }
+
+    // Loop through all partitions and validate
+    capacityMap.keySet().forEach(partitionName -> WagedValidationUtil
+        .validateAndGetPartitionCapacity(partitionName, resourceConfig, capacityMap,
+            clusterConfig));
+    return true;
+  }
 }
diff --git a/helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordJacksonSerializer.java b/helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordJacksonSerializer.java
new file mode 100644
index 0000000..b375e80
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/manager/zk/ZNRecordJacksonSerializer.java
@@ -0,0 +1,67 @@
+package org.apache.helix.manager.zk;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+import org.I0Itec.zkclient.exception.ZkMarshallingError;
+import org.I0Itec.zkclient.serialize.ZkSerializer;
+import org.apache.helix.HelixException;
+import org.apache.helix.ZNRecord;
+import org.codehaus.jackson.map.ObjectMapper;
+
+/**
+ * ZNRecordJacksonSerializer serializes ZNRecord objects into a byte array using Jackson. Note that
+ * this serializer doesn't check for the size of the resulting binary.
+ */
+public class ZNRecordJacksonSerializer implements ZkSerializer {
+  private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+
+  @Override
+  public byte[] serialize(Object record) throws ZkMarshallingError {
+    if (!(record instanceof ZNRecord)) {
+      // null is NOT an instance of any class
+      throw new HelixException("Input object is not of type ZNRecord (was " + record + ")");
+    }
+    ZNRecord znRecord = (ZNRecord) record;
+
+    try {
+      return OBJECT_MAPPER.writeValueAsBytes(znRecord);
+    } catch (IOException e) {
+      throw new HelixException(
+          String.format("Exception during serialization. ZNRecord id: %s", znRecord.getId()), e);
+    }
+  }
+
+  @Override
+  public Object deserialize(byte[] bytes) throws ZkMarshallingError {
+    if (bytes == null || bytes.length == 0) {
+      // reading a parent/null node
+      return null;
+    }
+
+    ZNRecord record;
+    try {
+      record = OBJECT_MAPPER.readValue(bytes, ZNRecord.class);
+    } catch (IOException e) {
+      throw new HelixException("Exception during deserialization!", e);
+    }
+    return record;
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/manager/zk/ZkBucketDataAccessor.java b/helix-core/src/main/java/org/apache/helix/manager/zk/ZkBucketDataAccessor.java
new file mode 100644
index 0000000..bc13471
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/manager/zk/ZkBucketDataAccessor.java
@@ -0,0 +1,380 @@
+package org.apache.helix.manager.zk;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import com.google.common.collect.ImmutableMap;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import java.util.TimerTask;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import org.I0Itec.zkclient.DataUpdater;
+import org.I0Itec.zkclient.exception.ZkMarshallingError;
+import org.I0Itec.zkclient.exception.ZkNoNodeException;
+import org.I0Itec.zkclient.serialize.ZkSerializer;
+import org.apache.helix.AccessOption;
+import org.apache.helix.BucketDataAccessor;
+import org.apache.helix.HelixException;
+import org.apache.helix.HelixProperty;
+import org.apache.helix.ZNRecord;
+import org.apache.helix.manager.zk.client.DedicatedZkClientFactory;
+import org.apache.helix.manager.zk.client.HelixZkClient;
+import org.apache.helix.util.GZipCompressionUtil;
+import org.codehaus.jackson.map.ObjectMapper;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class ZkBucketDataAccessor implements BucketDataAccessor, AutoCloseable {
+  private static final Logger LOG = LoggerFactory.getLogger(ZkBucketDataAccessor.class);
+
+  private static final int DEFAULT_BUCKET_SIZE = 50 * 1024; // 50KB
+  private static final long DEFAULT_VERSION_TTL = TimeUnit.MINUTES.toMillis(1L); // 1 min
+  private static final String BUCKET_SIZE_KEY = "BUCKET_SIZE";
+  private static final String DATA_SIZE_KEY = "DATA_SIZE";
+  private static final String METADATA_KEY = "METADATA";
+  private static final String LAST_SUCCESSFUL_WRITE_KEY = "LAST_SUCCESSFUL_WRITE";
+  private static final String LAST_WRITE_KEY = "LAST_WRITE";
+  private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+  // Thread pool for deleting stale versions
+  private static final ScheduledExecutorService GC_THREAD = Executors.newScheduledThreadPool(1);
+
+  private final int _bucketSize;
+  private final long _versionTTL;
+  private ZkSerializer _zkSerializer;
+  private HelixZkClient _zkClient;
+  private ZkBaseDataAccessor<byte[]> _zkBaseDataAccessor;
+
+  /**
+   * Constructor that allows a custom bucket size.
+   * @param zkAddr
+   * @param bucketSize
+   * @param versionTTL in ms
+   */
+  public ZkBucketDataAccessor(String zkAddr, int bucketSize, long versionTTL) {
+    _zkClient = DedicatedZkClientFactory.getInstance()
+        .buildZkClient(new HelixZkClient.ZkConnectionConfig(zkAddr));
+    _zkClient.setZkSerializer(new ZkSerializer() {
+      @Override
+      public byte[] serialize(Object data) throws ZkMarshallingError {
+        if (data instanceof byte[]) {
+          return (byte[]) data;
+        }
+        throw new HelixException("ZkBucketDataAccesor only supports a byte array as an argument!");
+      }
+
+      @Override
+      public Object deserialize(byte[] data) throws ZkMarshallingError {
+        return data;
+      }
+    });
+    _zkBaseDataAccessor = new ZkBaseDataAccessor<>(_zkClient);
+    _zkSerializer = new ZNRecordJacksonSerializer();
+    _bucketSize = bucketSize;
+    _versionTTL = versionTTL;
+  }
+
+  /**
+   * Constructor that uses a default bucket size.
+   * @param zkAddr
+   */
+  public ZkBucketDataAccessor(String zkAddr) {
+    this(zkAddr, DEFAULT_BUCKET_SIZE, DEFAULT_VERSION_TTL);
+  }
+
+  @Override
+  public <T extends HelixProperty> boolean compressedBucketWrite(String rootPath, T value)
+      throws IOException {
+    DataUpdater<byte[]> lastWriteVersionUpdater = dataInZk -> {
+      if (dataInZk == null || dataInZk.length == 0) {
+        // No last write version exists, so start with 0
+        return "0".getBytes();
+      }
+      // Last write exists, so increment and write it back
+      // **String conversion is necessary to make it display in ZK (zooinspector)**
+      String lastWriteVersionStr = new String(dataInZk);
+      long lastWriteVersion = Long.parseLong(lastWriteVersionStr);
+      lastWriteVersion++;
+      return String.valueOf(lastWriteVersion).getBytes();
+    };
+
+    // 1. Increment lastWriteVersion using DataUpdater
+    ZkBaseDataAccessor.AccessResult result = _zkBaseDataAccessor.doUpdate(
+        rootPath + "/" + LAST_WRITE_KEY, lastWriteVersionUpdater, AccessOption.PERSISTENT);
+    if (result._retCode != ZkBaseDataAccessor.RetCode.OK) {
+      throw new HelixException(
+          String.format("Failed to write the write version at path: %s!", rootPath));
+    }
+
+    // Successfully reserved a version number
+    byte[] binaryVersion = (byte[]) result._updatedValue;
+    String versionStr = new String(binaryVersion);
+    final long version = Long.parseLong(versionStr);
+
+    // 2. Write to the incremented last write version
+    String versionedDataPath = rootPath + "/" + versionStr;
+
+    // Take the ZNRecord and serialize it (get byte[])
+    byte[] serializedRecord = _zkSerializer.serialize(value.getRecord());
+    // Compress the byte[]
+    byte[] compressedRecord = GZipCompressionUtil.compress(serializedRecord);
+    // Compute N - number of buckets
+    int numBuckets = (compressedRecord.length + _bucketSize - 1) / _bucketSize;
+
+    List<String> paths = new ArrayList<>();
+    List<byte[]> buckets = new ArrayList<>();
+
+    int ptr = 0;
+    int counter = 0;
+    while (counter < numBuckets) {
+      paths.add(versionedDataPath + "/" + counter);
+      if (counter == numBuckets - 1) {
+        // Special treatment for the last bucket
+        buckets.add(
+            Arrays.copyOfRange(compressedRecord, ptr, ptr + compressedRecord.length % _bucketSize));
+      } else {
+        buckets.add(Arrays.copyOfRange(compressedRecord, ptr, ptr + _bucketSize));
+      }
+      ptr += _bucketSize;
+      counter++;
+    }
+
+    // 3. Include the metadata in the batch write
+    Map<String, String> metadata = ImmutableMap.of(BUCKET_SIZE_KEY, Integer.toString(_bucketSize),
+        DATA_SIZE_KEY, Integer.toString(compressedRecord.length));
+    byte[] binaryMetadata = OBJECT_MAPPER.writeValueAsBytes(metadata);
+    paths.add(versionedDataPath + "/" + METADATA_KEY);
+    buckets.add(binaryMetadata);
+
+    // Do an async set to ZK
+    boolean[] success = _zkBaseDataAccessor.setChildren(paths, buckets, AccessOption.PERSISTENT);
+    // Exception and fail the write if any failed
+    for (boolean s : success) {
+      if (!s) {
+        throw new HelixException(
+            String.format("Failed to write the data buckets for path: %s", rootPath));
+      }
+    }
+
+    // 4. Update lastSuccessfulWriteVersion using Updater
+    DataUpdater<byte[]> lastSuccessfulWriteVersionUpdater = dataInZk -> {
+      if (dataInZk == null || dataInZk.length == 0) {
+        // No last write version exists, so write version from this write
+        return versionStr.getBytes();
+      }
+      // Last successful write exists so check if it's smaller than my number
+      String lastWriteVersionStr = new String(dataInZk);
+      long lastWriteVersion = Long.parseLong(lastWriteVersionStr);
+      if (lastWriteVersion < version) {
+        // Smaller, so I can overwrite
+        return versionStr.getBytes();
+      } else {
+        // Greater, I have lagged behind. Return null and do not write
+        return null;
+      }
+    };
+    if (!_zkBaseDataAccessor.update(rootPath + "/" + LAST_SUCCESSFUL_WRITE_KEY,
+        lastSuccessfulWriteVersionUpdater, AccessOption.PERSISTENT)) {
+      throw new HelixException(String
+          .format("Failed to write the last successful write metadata at path: %s!", rootPath));
+    }
+
+    // 5. Update the timer for GC
+    updateGCTimer(rootPath, versionStr);
+    return true;
+  }
+
+  @Override
+  public <T extends HelixProperty> HelixProperty compressedBucketRead(String path,
+      Class<T> helixPropertySubType) {
+    return helixPropertySubType.cast(compressedBucketRead(path));
+  }
+
+  @Override
+  public void compressedBucketDelete(String path) {
+    if (!_zkBaseDataAccessor.remove(path, AccessOption.PERSISTENT)) {
+      throw new HelixException(String.format("Failed to delete the bucket data! Path: %s", path));
+    }
+  }
+
+  @Override
+  public void disconnect() {
+    if (!_zkClient.isClosed()) {
+      _zkClient.close();
+    }
+  }
+
+  private HelixProperty compressedBucketRead(String path) {
+    // 1. Get the version to read
+    byte[] binaryVersionToRead = _zkBaseDataAccessor.get(path + "/" + LAST_SUCCESSFUL_WRITE_KEY,
+        null, AccessOption.PERSISTENT);
+    if (binaryVersionToRead == null) {
+      throw new ZkNoNodeException(
+          String.format("Last successful write ZNode does not exist for path: %s", path));
+    }
+    String versionToRead = new String(binaryVersionToRead);
+
+    // 2. Get the metadata map
+    byte[] binaryMetadata = _zkBaseDataAccessor.get(path + "/" + versionToRead + "/" + METADATA_KEY,
+        null, AccessOption.PERSISTENT);
+    if (binaryMetadata == null) {
+      throw new ZkNoNodeException(
+          String.format("Metadata ZNode does not exist for path: %s", path));
+    }
+    Map metadata;
+    try {
+      metadata = OBJECT_MAPPER.readValue(binaryMetadata, Map.class);
+    } catch (IOException e) {
+      throw new HelixException(String.format("Failed to deserialize path metadata: %s!", path), e);
+    }
+
+    // 3. Read the data
+    Object bucketSizeObj = metadata.get(BUCKET_SIZE_KEY);
+    Object dataSizeObj = metadata.get(DATA_SIZE_KEY);
+    if (bucketSizeObj == null) {
+      throw new HelixException(
+          String.format("Metadata ZNRecord does not have %s! Path: %s", BUCKET_SIZE_KEY, path));
+    }
+    if (dataSizeObj == null) {
+      throw new HelixException(
+          String.format("Metadata ZNRecord does not have %s! Path: %s", DATA_SIZE_KEY, path));
+    }
+    int bucketSize = Integer.parseInt((String) bucketSizeObj);
+    int dataSize = Integer.parseInt((String) dataSizeObj);
+
+    // Compute N - number of buckets
+    int numBuckets = (dataSize + _bucketSize - 1) / _bucketSize;
+    byte[] compressedRecord = new byte[dataSize];
+    String dataPath = path + "/" + versionToRead;
+
+    List<String> paths = new ArrayList<>();
+    for (int i = 0; i < numBuckets; i++) {
+      paths.add(dataPath + "/" + i);
+    }
+
+    // Async get
+    List<byte[]> buckets = _zkBaseDataAccessor.get(paths, null, AccessOption.PERSISTENT, true);
+
+    // Combine buckets into one byte array
+    int copyPtr = 0;
+    for (int i = 0; i < numBuckets; i++) {
+      if (i == numBuckets - 1) {
+        // Special treatment for the last bucket
+        System.arraycopy(buckets.get(i), 0, compressedRecord, copyPtr, dataSize % bucketSize);
+      } else {
+        System.arraycopy(buckets.get(i), 0, compressedRecord, copyPtr, bucketSize);
+        copyPtr += bucketSize;
+      }
+    }
+
+    // Decompress the byte array
+    ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(compressedRecord);
+    byte[] serializedRecord;
+    try {
+      serializedRecord = GZipCompressionUtil.uncompress(byteArrayInputStream);
+    } catch (IOException e) {
+      throw new HelixException(String.format("Failed to decompress path: %s!", path), e);
+    }
+
+    // Deserialize the record to retrieve the original
+    ZNRecord originalRecord = (ZNRecord) _zkSerializer.deserialize(serializedRecord);
+    return new HelixProperty(originalRecord);
+  }
+
+  @Override
+  public void close() {
+    disconnect();
+  }
+
+  private void updateGCTimer(String rootPath, String currentVersion) {
+    TimerTask gcTask = new TimerTask() {
+      @Override
+      public void run() {
+        deleteStaleVersions(rootPath, currentVersion);
+      }
+    };
+
+    // Schedule the gc task with TTL
+    GC_THREAD.schedule(gcTask, _versionTTL, TimeUnit.MILLISECONDS);
+  }
+
+  /**
+   * Deletes all stale versions.
+   * @param rootPath
+   * @param currentVersion
+   */
+  private void deleteStaleVersions(String rootPath, String currentVersion) {
+    // Get all children names under path
+    List<String> children = _zkBaseDataAccessor.getChildNames(rootPath, AccessOption.PERSISTENT);
+    if (children == null || children.isEmpty()) {
+      // The whole path has been deleted so return immediately
+      return;
+    }
+    filterChildrenNames(children, currentVersion);
+    List<String> pathsToDelete = getPathsToDelete(rootPath, children);
+    for (String pathToDelete : pathsToDelete) {
+      // TODO: Should be batch delete but it doesn't work. It's okay since this runs async
+      _zkBaseDataAccessor.remove(pathToDelete, AccessOption.PERSISTENT);
+    }
+  }
+
+  /**
+   * Filter out non-version children names and non-stale versions.
+   * @param children
+   */
+  private void filterChildrenNames(List<String> children, String currentVersion) {
+    // Leave out metadata
+    children.remove(LAST_SUCCESSFUL_WRITE_KEY);
+    children.remove(LAST_WRITE_KEY);
+
+    // Leave out currentVersion and above
+    // This is because we want to honor the TTL for newer versions
+    children.remove(currentVersion);
+    long currentVer = Long.parseLong(currentVersion);
+    for (String child : children) {
+      try {
+        long version = Long.parseLong(child);
+        if (version >= currentVer) {
+          children.remove(child);
+        }
+      } catch (Exception e) {
+        // Ignore ZNode names that aren't parseable
+        children.remove(child);
+        LOG.debug("Found an invalid ZNode: {}", child);
+      }
+    }
+  }
+
+  /**
+   * Generates all stale paths to delete.
+   * @param path
+   * @param staleVersions
+   * @return
+   */
+  private List<String> getPathsToDelete(String path, List<String> staleVersions) {
+    List<String> pathsToDelete = new ArrayList<>();
+    staleVersions.forEach(ver -> pathsToDelete.add(path + "/" + ver));
+    return pathsToDelete;
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/model/ClusterConfig.java b/helix-core/src/main/java/org/apache/helix/model/ClusterConfig.java
index bb478c3..f88d2f5 100644
--- a/helix-core/src/main/java/org/apache/helix/model/ClusterConfig.java
+++ b/helix-core/src/main/java/org/apache/helix/model/ClusterConfig.java
@@ -24,7 +24,9 @@ import java.util.Collections;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
+import java.util.stream.Collectors;
 
+import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.Maps;
 import org.apache.helix.HelixException;
 import org.apache.helix.HelixProperty;
@@ -81,7 +83,38 @@ public class ClusterConfig extends HelixProperty {
     DISABLED_INSTANCES,
 
     // Specifies job types and used for quota allocation
-    QUOTA_TYPES
+    QUOTA_TYPES,
+
+    /**
+     * Configurable characteristics of the WAGED rebalancer.
+     * TODO: Split the WAGED rebalancer configuration items to the other config file.
+     */
+    // The required instance capacity keys for resource partition assignment calculation.
+    INSTANCE_CAPACITY_KEYS,
+    // The default instance capacity if no capacity is configured in the Instance Config node.
+    DEFAULT_INSTANCE_CAPACITY_MAP,
+    // The default partition weights if no weight is configured in the Resource Config node.
+    DEFAULT_PARTITION_WEIGHT_MAP,
+    // The preference of the rebalance result.
+    // EVENNESS - Evenness of the resource utilization, partition, and top state distribution.
+    // LESS_MOVEMENT - the tendency of keeping the current assignment instead of moving the partition for optimal assignment.
+    REBALANCE_PREFERENCE,
+    // Specify if the WAGED rebalancer should asynchronously perform the global rebalance, which is
+    // in general slower than the partial rebalance.
+    // Note that asynchronous global rebalance calculation will reduce the controller rebalance
+    // delay. But it may cause more partition movements. This is because the partial rebalance will
+    // be performed with a stale baseline. The rebalance result would be an intermediate one and
+    // could be changed again when a new baseline is calculated.
+    // For more details, please refer to
+    // https://github.com/apache/helix/wiki/Weight-aware-Globally-Evenly-distributed-Rebalancer#rebalance-coordinator
+    //
+    // Default to be true.
+    GLOBAL_REBALANCE_ASYNC_MODE
+  }
+
+  public enum GlobalRebalancePreferenceKey {
+    EVENNESS,
+    LESS_MOVEMENT
   }
 
   private final static int DEFAULT_MAX_CONCURRENT_TASK_PER_INSTANCE = 40;
@@ -95,6 +128,16 @@ public class ClusterConfig extends HelixProperty {
 
   public final static String TASK_QUOTA_RATIO_NOT_SET = "-1";
 
+  // Default preference for all the aspects should be the same to ensure balanced setup.
+  public final static Map<GlobalRebalancePreferenceKey, Integer>
+      DEFAULT_GLOBAL_REBALANCE_PREFERENCE =
+      ImmutableMap.<GlobalRebalancePreferenceKey, Integer>builder()
+          .put(GlobalRebalancePreferenceKey.EVENNESS, 1)
+          .put(GlobalRebalancePreferenceKey.LESS_MOVEMENT, 1).build();
+  private final static int MAX_REBALANCE_PREFERENCE = 10;
+  private final static int MIN_REBALANCE_PREFERENCE = 0;
+  public final static boolean DEFAULT_GLOBAL_REBALANCE_ASYNC_MODE_ENABLED = true;
+
   /**
    * Instantiate for a specific cluster
    * @param cluster the cluster identifier
@@ -113,21 +156,21 @@ public class ClusterConfig extends HelixProperty {
 
   /**
    * Set task quota type with the ratio of this quota.
-   * @param quotaType String
+   * @param quotaType  String
    * @param quotaRatio int
    */
   public void setTaskQuotaRatio(String quotaType, int quotaRatio) {
     if (_record.getMapField(ClusterConfigProperty.QUOTA_TYPES.name()) == null) {
       _record.setMapField(ClusterConfigProperty.QUOTA_TYPES.name(), new HashMap<String, String>());
     }
-    _record.getMapField(ClusterConfigProperty.QUOTA_TYPES.name()).put(quotaType,
-        Integer.toString(quotaRatio));
+    _record.getMapField(ClusterConfigProperty.QUOTA_TYPES.name())
+        .put(quotaType, Integer.toString(quotaRatio));
   }
 
   /**
    * Set task quota type with the ratio of this quota. Quota ratio must be a String that is
    * parse-able into an int.
-   * @param quotaType String
+   * @param quotaType  String
    * @param quotaRatio String
    */
   public void setTaskQuotaRatio(String quotaType, String quotaRatio) {
@@ -210,8 +253,8 @@ public class ClusterConfig extends HelixProperty {
    * @return
    */
   public Boolean isPersistIntermediateAssignment() {
-    return _record.getBooleanField(ClusterConfigProperty.PERSIST_INTERMEDIATE_ASSIGNMENT.toString(),
-        false);
+    return _record
+        .getBooleanField(ClusterConfigProperty.PERSIST_INTERMEDIATE_ASSIGNMENT.toString(), false);
   }
 
   /**
@@ -233,8 +276,8 @@ public class ClusterConfig extends HelixProperty {
   }
 
   public Boolean isPipelineTriggersDisabled() {
-    return _record.getBooleanField(ClusterConfigProperty.HELIX_DISABLE_PIPELINE_TRIGGERS.toString(),
-        false);
+    return _record
+        .getBooleanField(ClusterConfigProperty.HELIX_DISABLE_PIPELINE_TRIGGERS.toString(), false);
   }
 
   /**
@@ -403,8 +446,8 @@ public class ClusterConfig extends HelixProperty {
    * @return
    */
   public int getNumOfflineInstancesForAutoExit() {
-    return _record.getIntField(ClusterConfigProperty.NUM_OFFLINE_INSTANCES_FOR_AUTO_EXIT.name(),
-        -1);
+    return _record
+        .getIntField(ClusterConfigProperty.NUM_OFFLINE_INSTANCES_FOR_AUTO_EXIT.name(), -1);
   }
 
   /**
@@ -444,9 +487,7 @@ public class ClusterConfig extends HelixProperty {
     if (obj instanceof ClusterConfig) {
       ClusterConfig that = (ClusterConfig) obj;
 
-      if (this.getId().equals(that.getId())) {
-        return true;
-      }
+      return this.getId().equals(that.getId());
     }
     return false;
   }
@@ -490,8 +531,8 @@ public class ClusterConfig extends HelixProperty {
     }
 
     if (!configStrs.isEmpty()) {
-      _record.setListField(ClusterConfigProperty.STATE_TRANSITION_THROTTLE_CONFIGS.name(),
-          configStrs);
+      _record
+          .setListField(ClusterConfigProperty.STATE_TRANSITION_THROTTLE_CONFIGS.name(), configStrs);
     }
   }
 
@@ -579,7 +620,7 @@ public class ClusterConfig extends HelixProperty {
   public int getErrorPartitionThresholdForLoadBalance() {
     return _record.getIntField(
         ClusterConfigProperty.ERROR_PARTITION_THRESHOLD_FOR_LOAD_BALANCE.name(),
-        DEFAULT_ERROR_PARTITION_THRESHOLD_FOR_LOAD_BALANCE);
+            DEFAULT_ERROR_PARTITION_THRESHOLD_FOR_LOAD_BALANCE);
   }
 
   /**
@@ -658,6 +699,159 @@ public class ClusterConfig extends HelixProperty {
   }
 
   /**
+   * Set the required Instance Capacity Keys.
+   * @param capacityKeys
+   */
+  public void setInstanceCapacityKeys(List<String> capacityKeys) {
+    if (capacityKeys == null || capacityKeys.isEmpty()) {
+      throw new IllegalArgumentException("The input instance capacity key list is empty.");
+    }
+    _record.setListField(ClusterConfigProperty.INSTANCE_CAPACITY_KEYS.name(), capacityKeys);
+  }
+
+  /**
+   * @return The required Instance Capacity Keys. If not configured, return an empty list.
+   */
+  public List<String> getInstanceCapacityKeys() {
+    List<String> capacityKeys = _record.getListField(ClusterConfigProperty.INSTANCE_CAPACITY_KEYS.name());
+    if (capacityKeys == null) {
+      return Collections.emptyList();
+    }
+    return capacityKeys;
+  }
+
+  /**
+   * Get the default instance capacity information from the map fields.
+   *
+   * @return data map if it exists, or empty map
+   */
+  public Map<String, Integer> getDefaultInstanceCapacityMap() {
+    return getDefaultCapacityMap(ClusterConfigProperty.DEFAULT_INSTANCE_CAPACITY_MAP);
+  }
+
+  /**
+   * Set the default instance capacity information with an Integer mapping.
+   * This information is required by the global rebalancer.
+   * @see <a href="Rebalance Algorithm">
+   * https://github.com/apache/helix/wiki/Design-Proposal---Weight-Aware-Globally-Even-Distribute-Rebalancer#rebalance-algorithm-adapter
+   * </a>
+   * If the instance capacity is not configured in either Instance Config nor Cluster Config, the
+   * cluster topology is considered invalid. So the rebalancer may stop working.
+   * @param capacityDataMap - map of instance capacity data
+   * @throws IllegalArgumentException - when any of the data value is a negative number or when the map is empty
+   */
+  public void setDefaultInstanceCapacityMap(Map<String, Integer> capacityDataMap)
+      throws IllegalArgumentException {
+    setDefaultCapacityMap(ClusterConfigProperty.DEFAULT_INSTANCE_CAPACITY_MAP, capacityDataMap);
+  }
+
+  /**
+   * Get the default partition weight information from the map fields.
+   *
+   * @return data map if it exists, or empty map
+   */
+  public Map<String, Integer> getDefaultPartitionWeightMap() {
+    return getDefaultCapacityMap(ClusterConfigProperty.DEFAULT_PARTITION_WEIGHT_MAP);
+  }
+
+  /**
+   * Set the default partition weight information with an Integer mapping.
+   * This information is required by the global rebalancer.
+   * @see <a href="Rebalance Algorithm">
+   * https://github.com/apache/helix/wiki/Design-Proposal---Weight-Aware-Globally-Even-Distribute-Rebalancer#rebalance-algorithm-adapter
+   * </a>
+   * If the partition weight is not configured in either Resource Config nor Cluster Config, the
+   * cluster topology is considered invalid. So the rebalancer may stop working.
+   * @param weightDataMap - map of partition weight data
+   * @throws IllegalArgumentException - when any of the data value is a negative number or when the map is empty
+   */
+  public void setDefaultPartitionWeightMap(Map<String, Integer> weightDataMap)
+      throws IllegalArgumentException {
+    setDefaultCapacityMap(ClusterConfigProperty.DEFAULT_PARTITION_WEIGHT_MAP, weightDataMap);
+  }
+
+  private Map<String, Integer> getDefaultCapacityMap(ClusterConfigProperty capacityPropertyType) {
+    Map<String, String> capacityData = _record.getMapField(capacityPropertyType.name());
+    if (capacityData != null) {
+      return capacityData.entrySet().stream().collect(
+          Collectors.toMap(entry -> entry.getKey(), entry -> Integer.parseInt(entry.getValue())));
+    }
+    return Collections.emptyMap();
+  }
+
+  private void setDefaultCapacityMap(ClusterConfigProperty capacityPropertyType,
+      Map<String, Integer> capacityDataMap) throws IllegalArgumentException {
+    if (capacityDataMap == null) {
+      throw new IllegalArgumentException("Default capacity data is null");
+    }
+    Map<String, String> data = new HashMap<>();
+    capacityDataMap.entrySet().stream().forEach(entry -> {
+      if (entry.getValue() < 0) {
+        throw new IllegalArgumentException(String
+            .format("Default capacity data contains a negative value: %s = %d", entry.getKey(),
+                entry.getValue()));
+      }
+      data.put(entry.getKey(), Integer.toString(entry.getValue()));
+    });
+    _record.setMapField(capacityPropertyType.name(), data);
+  }
+
+  /**
+   * Set the global rebalancer's assignment preference.
+   * @param preference A map of the GlobalRebalancePreferenceKey and the corresponding weight.
+   *                   The ratio of the configured weights will determine the rebalancer's behavior.
+   */
+  public void setGlobalRebalancePreference(Map<GlobalRebalancePreferenceKey, Integer> preference) {
+    Map<String, String> preferenceMap = new HashMap<>();
+
+    preference.entrySet().stream().forEach(entry -> {
+      if (entry.getValue() > MAX_REBALANCE_PREFERENCE
+          || entry.getValue() < MIN_REBALANCE_PREFERENCE) {
+        throw new IllegalArgumentException(String
+            .format("Invalid global rebalance preference configuration. Key %s, Value %d.",
+                entry.getKey().name(), entry.getValue()));
+      }
+      preferenceMap.put(entry.getKey().name(), Integer.toString(entry.getValue()));
+    });
+
+    _record.setMapField(ClusterConfigProperty.REBALANCE_PREFERENCE.name(), preferenceMap);
+  }
+
+  /**
+   * Get the global rebalancer's assignment preference.
+   */
+  public Map<GlobalRebalancePreferenceKey, Integer> getGlobalRebalancePreference() {
+    Map<String, String> preferenceStrMap =
+        _record.getMapField(ClusterConfigProperty.REBALANCE_PREFERENCE.name());
+    if (preferenceStrMap != null && !preferenceStrMap.isEmpty()) {
+      Map<GlobalRebalancePreferenceKey, Integer> preference = new HashMap<>();
+      for (GlobalRebalancePreferenceKey key : GlobalRebalancePreferenceKey.values()) {
+        if (!preferenceStrMap.containsKey(key.name())) {
+          // If any key is not configured with a value, return the default config.
+          return DEFAULT_GLOBAL_REBALANCE_PREFERENCE;
+        }
+        preference.put(key, Integer.parseInt(preferenceStrMap.get(key.name())));
+      }
+      return preference;
+    }
+    // If configuration is not complete, return the default one.
+    return DEFAULT_GLOBAL_REBALANCE_PREFERENCE;
+  }
+
+  /**
+   * Set the asynchronous global rebalance mode.
+   * @param isAsync true if the global rebalance should be performed asynchronously
+   */
+  public void setGlobalRebalanceAsyncMode(boolean isAsync) {
+    _record.setBooleanField(ClusterConfigProperty.GLOBAL_REBALANCE_ASYNC_MODE.name(), isAsync);
+  }
+
+  public boolean isGlobalRebalanceAsyncModeEnabled() {
+    return _record.getBooleanField(ClusterConfigProperty.GLOBAL_REBALANCE_ASYNC_MODE.name(),
+        DEFAULT_GLOBAL_REBALANCE_ASYNC_MODE_ENABLED);
+  }
+
+  /**
    * Get IdealState rules defined in the cluster config.
    * @return
    */
diff --git a/helix-core/src/main/java/org/apache/helix/model/InstanceConfig.java b/helix-core/src/main/java/org/apache/helix/model/InstanceConfig.java
index 4d01766..b55ba83 100644
--- a/helix-core/src/main/java/org/apache/helix/model/InstanceConfig.java
+++ b/helix-core/src/main/java/org/apache/helix/model/InstanceConfig.java
@@ -27,6 +27,7 @@ import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
+import java.util.stream.Collectors;
 
 import com.google.common.base.Splitter;
 import org.apache.helix.HelixException;
@@ -54,7 +55,8 @@ public class InstanceConfig extends HelixProperty {
     INSTANCE_WEIGHT,
     DOMAIN,
     DELAY_REBALANCE_ENABLED,
-    MAX_CONCURRENT_TASK
+    MAX_CONCURRENT_TASK,
+    INSTANCE_CAPACITY_MAP
   }
 
   public static final int WEIGHT_NOT_SET = -1;
@@ -504,6 +506,54 @@ public class InstanceConfig extends HelixProperty {
     _record.setIntField(InstanceConfigProperty.MAX_CONCURRENT_TASK.name(), maxConcurrentTask);
   }
 
+  /**
+   * Get the instance capacity information from the map fields.
+   * @return data map if it exists, or empty map
+   */
+  public Map<String, Integer> getInstanceCapacityMap() {
+    Map<String, String> capacityData =
+        _record.getMapField(InstanceConfigProperty.INSTANCE_CAPACITY_MAP.name());
+
+    if (capacityData != null) {
+      return capacityData.entrySet().stream().collect(
+          Collectors.toMap(entry -> entry.getKey(), entry -> Integer.parseInt(entry.getValue())));
+    }
+    return Collections.emptyMap();
+  }
+
+  /**
+   * Set the instance capacity information with an Integer mapping.
+   * @param capacityDataMap - map of instance capacity data
+   * @throws IllegalArgumentException - when any of the data value is a negative number or when the map is incomplete
+   *
+   * This information is required by the global rebalancer.
+   * @see <a href="Rebalance Algorithm">
+   *   https://github.com/apache/helix/wiki/Design-Proposal---Weight-Aware-Globally-Even-Distribute-Rebalancer#rebalance-algorithm-adapter
+   *   </a>
+   * If the instance capacity is not configured in neither Instance Config nor Cluster Config, the
+   * cluster topology is considered invalid. So the rebalancer may stop working.
+   * Note that when a rebalancer requires this capacity information, it will ignore INSTANCE_WEIGHT.
+   */
+  public void setInstanceCapacityMap(Map<String, Integer> capacityDataMap)
+      throws IllegalArgumentException {
+    if (capacityDataMap == null) {
+      throw new IllegalArgumentException("Capacity Data is null");
+    }
+
+    Map<String, String> capacityData = new HashMap<>();
+
+    capacityDataMap.entrySet().stream().forEach(entry -> {
+      if (entry.getValue() < 0) {
+        throw new IllegalArgumentException(String
+            .format("Capacity Data contains a negative value: %s = %d", entry.getKey(),
+                entry.getValue()));
+      }
+      capacityData.put(entry.getKey(), Integer.toString(entry.getValue()));
+    });
+
+    _record.setMapField(InstanceConfigProperty.INSTANCE_CAPACITY_MAP.name(), capacityData);
+  }
+
   @Override
   public boolean equals(Object obj) {
     if (obj instanceof InstanceConfig) {
diff --git a/helix-core/src/main/java/org/apache/helix/model/ResourceConfig.java b/helix-core/src/main/java/org/apache/helix/model/ResourceConfig.java
index c37a594..9cdb673 100644
--- a/helix-core/src/main/java/org/apache/helix/model/ResourceConfig.java
+++ b/helix-core/src/main/java/org/apache/helix/model/ResourceConfig.java
@@ -19,7 +19,9 @@ package org.apache.helix.model;
  * under the License.
  */
 
+import java.io.IOException;
 import java.util.Collections;
+import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.TreeMap;
@@ -29,6 +31,8 @@ import org.apache.helix.ZNRecord;
 import org.apache.helix.api.config.HelixConfigProperty;
 import org.apache.helix.api.config.RebalanceConfig;
 import org.apache.helix.api.config.StateTransitionTimeoutConfig;
+import org.codehaus.jackson.map.ObjectMapper;
+import org.codehaus.jackson.type.TypeReference;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -53,7 +57,8 @@ public class ResourceConfig extends HelixProperty {
     RESOURCE_TYPE,
     GROUP_ROUTING_ENABLED,
     EXTERNAL_VIEW_DISABLED,
-    DELAY_REBALANCE_ENABLED
+    DELAY_REBALANCE_ENABLED,
+    PARTITION_CAPACITY_MAP
   }
 
   public enum ResourceConfigConstants {
@@ -61,6 +66,10 @@ public class ResourceConfig extends HelixProperty {
   }
 
   private static final Logger _logger = LoggerFactory.getLogger(ResourceConfig.class.getName());
+  private static final ObjectMapper _objectMapper = new ObjectMapper();
+
+  public static final String DEFAULT_PARTITION_KEY = "DEFAULT";
+
   /**
    * Instantiate for a specific instance
    *
@@ -92,10 +101,24 @@ public class ResourceConfig extends HelixProperty {
       String stateModelDefRef, String stateModelFactoryName, String numReplica,
       int minActiveReplica, int maxPartitionsPerInstance, String instanceGroupTag,
       Boolean helixEnabled, String resourceGroupName, String resourceType,
-      Boolean groupRoutingEnabled, Boolean externalViewDisabled,
-      RebalanceConfig rebalanceConfig, StateTransitionTimeoutConfig stateTransitionTimeoutConfig,
+      Boolean groupRoutingEnabled, Boolean externalViewDisabled, RebalanceConfig rebalanceConfig,
+      StateTransitionTimeoutConfig stateTransitionTimeoutConfig,
       Map<String, List<String>> listFields, Map<String, Map<String, String>> mapFields,
       Boolean p2pMessageEnabled) {
+    this(resourceId, monitorDisabled, numPartitions, stateModelDefRef, stateModelFactoryName,
+        numReplica, minActiveReplica, maxPartitionsPerInstance, instanceGroupTag, helixEnabled,
+        resourceGroupName, resourceType, groupRoutingEnabled, externalViewDisabled, rebalanceConfig,
+        stateTransitionTimeoutConfig, listFields, mapFields, p2pMessageEnabled, null);
+  }
+
+  private ResourceConfig(String resourceId, Boolean monitorDisabled, int numPartitions,
+    String stateModelDefRef, String stateModelFactoryName, String numReplica,
+    int minActiveReplica, int maxPartitionsPerInstance, String instanceGroupTag,
+        Boolean helixEnabled, String resourceGroupName, String resourceType,
+        Boolean groupRoutingEnabled, Boolean externalViewDisabled,
+        RebalanceConfig rebalanceConfig, StateTransitionTimeoutConfig stateTransitionTimeoutConfig,
+        Map<String, List<String>> listFields, Map<String, Map<String, String>> mapFields,
+        Boolean p2pMessageEnabled, Map<String, Map<String, Integer>> partitionCapacityMap) {
     super(resourceId);
 
     if (monitorDisabled != null) {
@@ -172,6 +195,15 @@ public class ResourceConfig extends HelixProperty {
     if (mapFields != null) {
       _record.setMapFields(mapFields);
     }
+
+    if (partitionCapacityMap != null) {
+      try {
+        setPartitionCapacityMap(partitionCapacityMap);
+      } catch (IOException e) {
+        throw new IllegalArgumentException(
+            "Failed to set partition capacity. Invalid capacity configuration.");
+      }
+    }
   }
 
 
@@ -350,6 +382,64 @@ public class ResourceConfig extends HelixProperty {
   }
 
   /**
+   * Get the partition capacity information from a JSON among the map fields.
+   * <PartitionName or DEFAULT_PARTITION_KEY, <Capacity Key, Capacity Number>>
+   *
+   * @return data map if it exists, or empty map
+   * @throws IOException - when JSON conversion fails
+   */
+  public Map<String, Map<String, Integer>> getPartitionCapacityMap() throws IOException {
+    Map<String, String> partitionCapacityData =
+        _record.getMapField(ResourceConfigProperty.PARTITION_CAPACITY_MAP.name());
+    Map<String, Map<String, Integer>> partitionCapacityMap = new HashMap<>();
+    if (partitionCapacityData != null) {
+      for (String partition : partitionCapacityData.keySet()) {
+        Map<String, Integer> capacities = _objectMapper
+            .readValue(partitionCapacityData.get(partition),
+                new TypeReference<Map<String, Integer>>() {
+                });
+        partitionCapacityMap.put(partition, capacities);
+      }
+    }
+    return partitionCapacityMap;
+  }
+
+  /**
+   * Set the partition capacity information with a map <PartitionName or DEFAULT_PARTITION_KEY, <Capacity Key, Capacity Number>>
+   *
+   * @param partitionCapacityMap - map of partition capacity data
+   * @throws IllegalArgumentException - when any of the data value is a negative number or map is incomplete
+   * @throws IOException              - when JSON parsing fails
+   */
+  public void setPartitionCapacityMap(Map<String, Map<String, Integer>> partitionCapacityMap)
+      throws IllegalArgumentException, IOException {
+    if (partitionCapacityMap == null) {
+      throw new IllegalArgumentException("Capacity Map is null");
+    }
+    if (!partitionCapacityMap.containsKey(DEFAULT_PARTITION_KEY)) {
+      throw new IllegalArgumentException(String
+          .format("The default partition capacity with the default key %s is required.",
+              DEFAULT_PARTITION_KEY));
+    }
+
+    Map<String, String> newCapacityRecord = new HashMap<>();
+    for (String partition : partitionCapacityMap.keySet()) {
+      Map<String, Integer> capacities = partitionCapacityMap.get(partition);
+      // Verify the input is valid
+      if (capacities.isEmpty()) {
+        throw new IllegalArgumentException("Capacity Data is empty");
+      }
+      if (capacities.entrySet().stream().anyMatch(entry -> entry.getValue() < 0)) {
+        throw new IllegalArgumentException(
+            String.format("Capacity Data contains a negative value:%s", capacities.toString()));
+      }
+      newCapacityRecord.put(partition, _objectMapper.writeValueAsString(capacities));
+    }
+
+    _record.setMapField(ResourceConfigProperty.PARTITION_CAPACITY_MAP.name(), newCapacityRecord);
+  }
+
+  /**
    * Put a set of simple configs.
    *
    * @param configsMap
@@ -476,6 +566,7 @@ public class ResourceConfig extends HelixProperty {
     private StateTransitionTimeoutConfig _stateTransitionTimeoutConfig;
     private Map<String, List<String>> _preferenceLists;
     private Map<String, Map<String, String>> _mapFields;
+    private Map<String, Map<String, Integer>> _partitionCapacityMap;
 
     public Builder(String resourceId) {
       _resourceId = resourceId;
@@ -664,6 +755,23 @@ public class ResourceConfig extends HelixProperty {
       return _preferenceLists;
     }
 
+    public Builder setPartitionCapacity(Map<String, Integer> defaultCapacity) {
+      setPartitionCapacity(DEFAULT_PARTITION_KEY, defaultCapacity);
+      return this;
+    }
+
+    public Builder setPartitionCapacity(String partition, Map<String, Integer> capacity) {
+      if (_partitionCapacityMap == null) {
+        _partitionCapacityMap = new HashMap<>();
+      }
+      _partitionCapacityMap.put(partition, capacity);
+      return this;
+    }
+
+    public Map<String, Integer> getPartitionCapacity(String partition) {
+      return _partitionCapacityMap.get(partition);
+    }
+
     public Builder setMapField(String key, Map<String, String> fields) {
       if (_mapFields == null) {
         _mapFields = new TreeMap<>();
@@ -708,6 +816,19 @@ public class ResourceConfig extends HelixProperty {
           }
         }
       }
+
+      if (_partitionCapacityMap != null) {
+        if (_partitionCapacityMap.keySet().stream()
+            .noneMatch(partition -> partition.equals(DEFAULT_PARTITION_KEY))) {
+          throw new IllegalArgumentException(
+              "Partition capacity is configured without the DEFAULT capacity!");
+        }
+        if (_partitionCapacityMap.values().stream()
+            .anyMatch(capacity -> capacity.values().stream().anyMatch(value -> value < 0))) {
+          throw new IllegalArgumentException(
+              "Partition capacity is configured with negative capacity value!");
+        }
+      }
     }
 
     public ResourceConfig build() {
@@ -718,7 +839,7 @@ public class ResourceConfig extends HelixProperty {
           _stateModelFactoryName, _numReplica, _minActiveReplica, _maxPartitionsPerInstance,
           _instanceGroupTag, _helixEnabled, _resourceGroupName, _resourceType, _groupRoutingEnabled,
           _externalViewDisabled, _rebalanceConfig, _stateTransitionTimeoutConfig, _preferenceLists,
-          _mapFields, _p2pMessageEnabled);
+          _mapFields, _p2pMessageEnabled, _partitionCapacityMap);
     }
   }
 }
diff --git a/helix-core/src/main/java/org/apache/helix/model/StateModelDefinition.java b/helix-core/src/main/java/org/apache/helix/model/StateModelDefinition.java
index ae59522..0a40331 100644
--- a/helix-core/src/main/java/org/apache/helix/model/StateModelDefinition.java
+++ b/helix-core/src/main/java/org/apache/helix/model/StateModelDefinition.java
@@ -46,6 +46,8 @@ public class StateModelDefinition extends HelixProperty {
     STATE_PRIORITY_LIST
   }
 
+  public static final int TOP_STATE_PRIORITY = 1;
+
   /**
    * state model's initial state
    */
@@ -98,7 +100,7 @@ public class StateModelDefinition extends HelixProperty {
     _stateTransitionTable = new HashMap<>();
     _statesCountMap = new HashMap<>();
     if (_statesPriorityList != null) {
-      int priority = 1;
+      int priority = TOP_STATE_PRIORITY;
       for (String state : _statesPriorityList) {
         Map<String, String> metaData = record.getMapField(state + ".meta");
         if (metaData != null) {
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java b/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
index d6c3bb2..fc0b19d 100644
--- a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
+++ b/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
@@ -236,27 +236,28 @@ public class ClusterStatusMonitor implements ClusterStatusMonitorMBean {
       // Unregister beans for instances that are no longer configured
       Set<String> toUnregister = Sets.newHashSet(_instanceMonitorMap.keySet());
       toUnregister.removeAll(instanceSet);
-      try {
-        unregisterInstances(toUnregister);
-      } catch (MalformedObjectNameException e) {
-        LOG.error("Could not unregister instances from MBean server: " + toUnregister, e);
-      }
+      unregisterInstances(toUnregister);
 
       // Register beans for instances that are newly configured
       Set<String> toRegister = Sets.newHashSet(instanceSet);
       toRegister.removeAll(_instanceMonitorMap.keySet());
       Set<InstanceMonitor> monitorsToRegister = Sets.newHashSet();
       for (String instanceName : toRegister) {
-        InstanceMonitor bean = new InstanceMonitor(_clusterName, instanceName);
-        bean.updateInstance(tags.get(instanceName), disabledPartitions.get(instanceName),
-            oldDisabledPartitions.get(instanceName), liveInstanceSet.contains(instanceName),
-            !disabledInstanceSet.contains(instanceName));
-        monitorsToRegister.add(bean);
+        try {
+          ObjectName objectName = getObjectName(getInstanceBeanName(instanceName));
+          InstanceMonitor bean = new InstanceMonitor(_clusterName, instanceName, objectName);
+          bean.updateInstance(tags.get(instanceName), disabledPartitions.get(instanceName),
+              oldDisabledPartitions.get(instanceName), liveInstanceSet.contains(instanceName),
+              !disabledInstanceSet.contains(instanceName));
+          monitorsToRegister.add(bean);
+        } catch (MalformedObjectNameException ex) {
+          LOG.error("Failed to create instance monitor for instance: {}.", instanceName);
+        }
       }
       try {
         registerInstances(monitorsToRegister);
-      } catch (MalformedObjectNameException e) {
-        LOG.error("Could not register instances with MBean server: " + toRegister, e);
+      } catch (JMException e) {
+        LOG.error("Could not register instances with MBean server: {}.", toRegister, e);
       }
 
       // Update all the sets
@@ -282,8 +283,8 @@ public class ClusterStatusMonitor implements ClusterStatusMonitorMBean {
             try {
               unregisterInstances(Arrays.asList(instanceName));
               registerInstances(Arrays.asList(bean));
-            } catch (MalformedObjectNameException e) {
-              LOG.error("Could not refresh registration with MBean server: " + instanceName, e);
+            } catch (JMException e) {
+              LOG.error("Could not refresh registration with MBean server: {}", instanceName, e);
             }
           }
         }
@@ -366,6 +367,28 @@ public class ClusterStatusMonitor implements ClusterStatusMonitorMBean {
   }
 
   /**
+   * Updates instance capacity status for per instance, including max usage and capacity of each
+   * capacity key. Before calling this API, we assume the instance monitors are already registered
+   * in ReadClusterDataStage. If the monitor is not registered, this instance capacity status update
+   * will fail.
+   *
+   * @param instanceName This instance name
+   * @param maxUsage Max capacity usage of this instance
+   * @param capacityMap A map of this instance capacity, {capacity key: capacity value}
+   */
+  public void updateInstanceCapacityStatus(String instanceName, double maxUsage,
+      Map<String, Integer> capacityMap) {
+    InstanceMonitor monitor = _instanceMonitorMap.get(instanceName);
+    if (monitor == null) {
+      LOG.warn("Failed to update instance capacity status because instance monitor is not found, "
+          + "instance: {}.", instanceName);
+      return;
+    }
+    monitor.updateMaxCapacityUsage(maxUsage);
+    monitor.updateCapacity(capacityMap);
+  }
+
+  /**
    * Update gauges for resource at instance level
    * @param bestPossibleStates
    * @param resourceMap
@@ -474,6 +497,25 @@ public class ClusterStatusMonitor implements ClusterStatusMonitorMBean {
     }
   }
 
+  /**
+   * Updates metrics of average partition weight per capacity key for a resource. If a resource
+   * monitor is not yet existed for this resource, a new resource monitor will be created for this
+   * resource.
+   *
+   * @param resourceName The resource name for which partition weight is updated
+   * @param averageWeightMap A map of average partition weight of each capacity key:
+   *                         capacity key -> average partition weight
+   */
+  public void updatePartitionWeight(String resourceName, Map<String, Integer> averageWeightMap) {
+    ResourceMonitor monitor = getOrCreateResourceMonitor(resourceName);
+    if (monitor == null) {
+      LOG.warn("Failed to update partition weight metric for resource: {} because resource monitor"
+          + " is not created.", resourceName);
+      return;
+    }
+    monitor.updatePartitionWeightStats(averageWeightMap);
+  }
+
   public void updateMissingTopStateDurationStats(String resourceName, long totalDuration,
       long helixLatency, boolean isGraceful, boolean succeeded) {
     ResourceMonitor resourceMonitor = getOrCreateResourceMonitor(resourceName);
@@ -694,31 +736,35 @@ public class ClusterStatusMonitor implements ClusterStatusMonitorMBean {
   }
 
   private void registerInstances(Collection<InstanceMonitor> instances)
-      throws MalformedObjectNameException {
+      throws JMException {
     synchronized (_instanceMonitorMap) {
       for (InstanceMonitor monitor : instances) {
         String instanceName = monitor.getInstanceName();
-        String beanName = getInstanceBeanName(instanceName);
-        register(monitor, getObjectName(beanName));
+        // If this instance MBean is already registered, unregister it.
+        InstanceMonitor removedMonitor = _instanceMonitorMap.remove(instanceName);
+        if (removedMonitor != null) {
+          removedMonitor.unregister();
+        }
+        monitor.register();
         _instanceMonitorMap.put(instanceName, monitor);
       }
     }
   }
 
-  private void unregisterAllInstances() throws MalformedObjectNameException {
+  private void unregisterAllInstances() {
     synchronized (_instanceMonitorMap) {
       unregisterInstances(_instanceMonitorMap.keySet());
     }
   }
 
-  private void unregisterInstances(Collection<String> instances)
-      throws MalformedObjectNameException {
+  private void unregisterInstances(Collection<String> instances) {
     synchronized (_instanceMonitorMap) {
       for (String instanceName : instances) {
-        String beanName = getInstanceBeanName(instanceName);
-        unregister(getObjectName(beanName));
+        InstanceMonitor monitor = _instanceMonitorMap.remove(instanceName);
+        if (monitor != null) {
+          monitor.unregister();
+        }
       }
-      _instanceMonitorMap.keySet().removeAll(instances);
     }
   }
 
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/InstanceMonitor.java b/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/InstanceMonitor.java
index dc43d48..e0c0f89 100644
--- a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/InstanceMonitor.java
+++ b/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/InstanceMonitor.java
@@ -23,36 +23,105 @@ import java.util.Collections;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+import javax.management.JMException;
+import javax.management.ObjectName;
 
 import com.google.common.base.Joiner;
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.Lists;
+import org.apache.helix.monitoring.mbeans.dynamicMBeans.DynamicMBeanProvider;
+import org.apache.helix.monitoring.mbeans.dynamicMBeans.DynamicMetric;
+import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
+
 
 /**
  * Implementation of the instance status bean
  */
-public class InstanceMonitor implements InstanceMonitorMBean {
+public class InstanceMonitor extends DynamicMBeanProvider {
+  /**
+   * Metric names for instance capacity.
+   */
+  public enum InstanceMonitorMetric {
+    // TODO: change the metric names with Counter and Gauge suffix and deprecate old names.
+    TOTAL_MESSAGE_RECEIVED_COUNTER("TotalMessageReceived"),
+    ENABLED_STATUS_GAUGE("Enabled"),
+    ONLINE_STATUS_GAUGE("Online"),
+    DISABLED_PARTITIONS_GAUGE("DisabledPartitions"),
+    MAX_CAPACITY_USAGE_GAUGE("MaxCapacityUsageGauge");
+
+    private final String metricName;
+
+    InstanceMonitorMetric(String name) {
+      metricName = name;
+    }
+
+    public String metricName() {
+      return metricName;
+    }
+  }
+
   private final String _clusterName;
   private final String _participantName;
+  private final ObjectName _initObjectName;
+
   private List<String> _tags;
-  private long _disabledPartitions;
-  private boolean _isUp;
-  private boolean _isEnabled;
-  private long _totalMessageReceived;
+
+  // Counters
+  private SimpleDynamicMetric<Long> _totalMessagedReceivedCounter;
+
+  // Gauges
+  private SimpleDynamicMetric<Long> _enabledStatusGauge;
+  private SimpleDynamicMetric<Long> _disabledPartitionsGauge;
+  private SimpleDynamicMetric<Long> _onlineStatusGauge;
+  private SimpleDynamicMetric<Double> _maxCapacityUsageGauge;
+
+  // A map of dynamic capacity Gauges. The map's keys could change.
+  private final Map<String, SimpleDynamicMetric<Long>> _dynamicCapacityMetricsMap;
 
   /**
    * Initialize the bean
    * @param clusterName the cluster to monitor
    * @param participantName the instance whose statistics this holds
    */
-  public InstanceMonitor(String clusterName, String participantName) {
+  public InstanceMonitor(String clusterName, String participantName, ObjectName objectName) {
     _clusterName = clusterName;
     _participantName = participantName;
     _tags = ImmutableList.of(ClusterStatusMonitor.DEFAULT_TAG);
-    _disabledPartitions = 0L;
-    _isUp = false;
-    _isEnabled = false;
-    _totalMessageReceived = 0;
+    _initObjectName = objectName;
+    _dynamicCapacityMetricsMap = new ConcurrentHashMap<>();
+
+    createMetrics();
+  }
+
+  private void createMetrics() {
+    _totalMessagedReceivedCounter = new SimpleDynamicMetric<>(
+        InstanceMonitorMetric.TOTAL_MESSAGE_RECEIVED_COUNTER.metricName(), 0L);
+
+    _disabledPartitionsGauge =
+        new SimpleDynamicMetric<>(InstanceMonitorMetric.DISABLED_PARTITIONS_GAUGE.metricName(),
+            0L);
+    _enabledStatusGauge =
+        new SimpleDynamicMetric<>(InstanceMonitorMetric.ENABLED_STATUS_GAUGE.metricName(), 0L);
+    _onlineStatusGauge =
+        new SimpleDynamicMetric<>(InstanceMonitorMetric.ONLINE_STATUS_GAUGE.metricName(), 0L);
+    _maxCapacityUsageGauge =
+        new SimpleDynamicMetric<>(InstanceMonitorMetric.MAX_CAPACITY_USAGE_GAUGE.metricName(),
+            0.0d);
+  }
+
+  private List<DynamicMetric<?, ?>> buildAttributeList() {
+    List<DynamicMetric<?, ?>> attributeList = Lists.newArrayList(
+        _totalMessagedReceivedCounter,
+        _disabledPartitionsGauge,
+        _enabledStatusGauge,
+        _onlineStatusGauge,
+        _maxCapacityUsageGauge
+    );
+
+    attributeList.addAll(_dynamicCapacityMetricsMap.values());
+
+    return attributeList;
   }
 
   @Override
@@ -61,44 +130,32 @@ public class InstanceMonitor implements InstanceMonitorMBean {
         serializedTags(), _participantName);
   }
 
-  @Override
-  public long getOnline() {
-    return _isUp ? 1 : 0;
+  protected long getOnline() {
+    return _onlineStatusGauge.getValue();
   }
 
-  @Override
-  public long getEnabled() {
-    return _isEnabled ? 1 : 0;
+  protected long getEnabled() {
+    return _enabledStatusGauge.getValue();
   }
 
-  @Override
-  public long getTotalMessageReceived() {
-    return _totalMessageReceived;
+  protected long getTotalMessageReceived() {
+    return _totalMessagedReceivedCounter.getValue();
   }
 
-  @Override
-  public long getDisabledPartitions() {
-    return _disabledPartitions;
-  }
-
-  /**
-   * Get all the tags currently on this instance
-   * @return list of tags
-   */
-  public List<String> getTags() {
-    return _tags;
+  protected long getDisabledPartitions() {
+    return _disabledPartitionsGauge.getValue();
   }
 
   /**
    * Get the name of the monitored instance
    * @return instance name as a string
    */
-  public String getInstanceName() {
+  protected String getInstanceName() {
     return _participantName;
   }
 
   private String serializedTags() {
-    return Joiner.on('|').skipNulls().join(_tags).toString();
+    return Joiner.on('|').skipNulls().join(_tags);
   }
 
   /**
@@ -117,20 +174,22 @@ public class InstanceMonitor implements InstanceMonitorMBean {
       _tags = Lists.newArrayList(tags);
       Collections.sort(_tags);
     }
-    _disabledPartitions = 0L;
+    long numDisabledPartitions = 0L;
     if (disabledPartitions != null) {
       for (List<String> partitions : disabledPartitions.values()) {
         if (partitions != null) {
-          _disabledPartitions += partitions.size();
+          numDisabledPartitions += partitions.size();
         }
       }
     }
     // TODO : Get rid of this when old API removed.
     if (oldDisabledPartitions != null) {
-      _disabledPartitions += oldDisabledPartitions.size();
+      numDisabledPartitions += oldDisabledPartitions.size();
     }
-    _isUp = isLive;
-    _isEnabled = isEnabled;
+
+    _onlineStatusGauge.updateValue(isLive ? 1L : 0L);
+    _enabledStatusGauge.updateValue(isEnabled ? 1L : 0L);
+    _disabledPartitionsGauge.updateValue(numDisabledPartitions);
   }
 
   /**
@@ -138,7 +197,64 @@ public class InstanceMonitor implements InstanceMonitorMBean {
    * @param messageReceived received message numbers
    */
   public synchronized void increaseMessageCount(long messageReceived) {
-    _totalMessageReceived += messageReceived;
+    _totalMessagedReceivedCounter
+        .updateValue(_totalMessagedReceivedCounter.getValue() + messageReceived);
+  }
+
+  /**
+   * Updates max capacity usage for this instance.
+   * @param maxUsage max capacity usage of this instance
+   */
+  public synchronized void updateMaxCapacityUsage(double maxUsage) {
+    _maxCapacityUsageGauge.updateValue(maxUsage);
+  }
+
+  /**
+   * Gets max capacity usage of this instance.
+   * @return Max capacity usage of this instance.
+   */
+  protected synchronized double getMaxCapacityUsageGauge() {
+    return _maxCapacityUsageGauge.getValue();
+  }
+
+  /**
+   * Updates instance capacity metrics.
+   * @param capacity A map of instance capacity.
+   */
+  public void updateCapacity(Map<String, Integer> capacity) {
+    synchronized (_dynamicCapacityMetricsMap) {
+      // If capacity keys don't have any change, we just update the metric values.
+      if (_dynamicCapacityMetricsMap.keySet().equals(capacity.keySet())) {
+        for (Map.Entry<String, Integer> entry : capacity.entrySet()) {
+          _dynamicCapacityMetricsMap.get(entry.getKey()).updateValue((long) entry.getValue());
+        }
+        return;
+      }
+
+      // If capacity keys have any changes, we need to retain the capacity metrics.
+      // Make sure capacity metrics map has the same capacity keys.
+      // And update metrics values.
+      _dynamicCapacityMetricsMap.keySet().retainAll(capacity.keySet());
+      for (Map.Entry<String, Integer> entry : capacity.entrySet()) {
+        String capacityName = entry.getKey();
+        if (_dynamicCapacityMetricsMap.containsKey(capacityName)) {
+          _dynamicCapacityMetricsMap.get(capacityName).updateValue((long) entry.getValue());
+        } else {
+          _dynamicCapacityMetricsMap.put(capacityName,
+              new SimpleDynamicMetric<>(capacityName + "Gauge", (long) entry.getValue()));
+        }
+      }
+    }
+
+    // Update MBean's all attributes.
+    updateAttributesInfo(buildAttributeList(),
+        "Instance monitor for instance: " + getInstanceName());
   }
 
+  @Override
+  public DynamicMBeanProvider register() throws JMException {
+    doRegister(buildAttributeList(), _initObjectName);
+
+    return this;
+  }
 }
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/MonitorDomainNames.java b/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/MonitorDomainNames.java
index 73bf057..fee9099 100644
--- a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/MonitorDomainNames.java
+++ b/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/MonitorDomainNames.java
@@ -28,5 +28,6 @@ public enum MonitorDomainNames {
   HelixThreadPoolExecutor,
   HelixCallback,
   RoutingTableProvider,
-  CLMParticipantReport
+  CLMParticipantReport,
+  Rebalancer
 }
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ResourceMonitor.java b/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ResourceMonitor.java
index d7a368e..af9c318 100644
--- a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ResourceMonitor.java
+++ b/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ResourceMonitor.java
@@ -19,18 +19,19 @@ package org.apache.helix.monitoring.mbeans;
  * under the License.
  */
 
-import java.util.ArrayList;
 import java.util.Collections;
 import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.TimeUnit;
 import javax.management.JMException;
 import javax.management.ObjectName;
 
 import com.codahale.metrics.Histogram;
 import com.codahale.metrics.SlidingTimeWindowArrayReservoir;
+import com.google.common.collect.Lists;
 import org.apache.helix.HelixDefinedState;
 import org.apache.helix.model.ExternalView;
 import org.apache.helix.model.IdealState;
@@ -49,6 +50,8 @@ public class ResourceMonitor extends DynamicMBeanProvider {
     INTERMEDIATE_STATE_CAL_FAILED
   }
 
+  private static final String GAUGE_METRIC_SUFFIX = "Gauge";
+
   // Gauges
   private SimpleDynamicMetric<Long> _numOfPartitions;
   private SimpleDynamicMetric<Long> _numOfPartitionsInExternalView;
@@ -83,31 +86,13 @@ public class ResourceMonitor extends DynamicMBeanProvider {
   private final String _clusterName;
   private final ObjectName _initObjectName;
 
+  // A map of dynamic capacity Gauges. The map's keys could change.
+  private final Map<String, SimpleDynamicMetric<Long>> _dynamicCapacityMetricsMap;
+
   @Override
-  public ResourceMonitor register() throws JMException {
-    List<DynamicMetric<?, ?>> attributeList = new ArrayList<>();
-    attributeList.add(_numOfPartitions);
-    attributeList.add(_numOfPartitionsInExternalView);
-    attributeList.add(_numOfErrorPartitions);
-    attributeList.add(_numNonTopStatePartitions);
-    attributeList.add(_numLessMinActiveReplicaPartitions);
-    attributeList.add(_numLessReplicaPartitions);
-    attributeList.add(_numPendingRecoveryRebalancePartitions);
-    attributeList.add(_numPendingLoadRebalancePartitions);
-    attributeList.add(_numRecoveryRebalanceThrottledPartitions);
-    attributeList.add(_numLoadRebalanceThrottledPartitions);
-    attributeList.add(_externalViewIdealStateDiff);
-    attributeList.add(_successfulTopStateHandoffDurationCounter);
-    attributeList.add(_successTopStateHandoffCounter);
-    attributeList.add(_failedTopStateHandoffCounter);
-    attributeList.add(_maxSinglePartitionTopStateHandoffDuration);
-    attributeList.add(_partitionTopStateHandoffDurationGauge);
-    attributeList.add(_partitionTopStateHandoffHelixLatencyGauge);
-    attributeList.add(_partitionTopStateNonGracefulHandoffDurationGauge);
-    attributeList.add(_totalMessageReceived);
-    attributeList.add(_numPendingStateTransitions);
-    attributeList.add(_rebalanceState);
-    doRegister(attributeList, _initObjectName);
+  public DynamicMBeanProvider register() throws JMException {
+    doRegister(buildAttributeList(), _initObjectName);
+
     return this;
   }
 
@@ -116,10 +101,12 @@ public class ResourceMonitor extends DynamicMBeanProvider {
   }
 
   @SuppressWarnings("unchecked")
-  public ResourceMonitor(String clusterName, String resourceName, ObjectName objectName) {
+  public ResourceMonitor(String clusterName, String resourceName, ObjectName objectName)
+      throws JMException {
     _clusterName = clusterName;
     _resourceName = resourceName;
     _initObjectName = objectName;
+    _dynamicCapacityMetricsMap = new ConcurrentHashMap<>();
 
     _externalViewIdealStateDiff = new SimpleDynamicMetric("DifferenceWithIdealStateGauge", 0L);
     _numLoadRebalanceThrottledPartitions =
@@ -382,6 +369,36 @@ public class ResourceMonitor extends DynamicMBeanProvider {
     _numLoadRebalanceThrottledPartitions.updateValue(numLoadRebalanceThrottledPartitions);
   }
 
+  /**
+   * Updates partition weight metric. If the partition capacity keys are changed, all MBean
+   * attributes will be updated accordingly: old capacity keys will be replaced with new capacity
+   * keys in MBean server.
+   *
+   * @param partitionWeightMap A map of partition weight: capacity key -> partition weight
+   */
+  void updatePartitionWeightStats(Map<String, Integer> partitionWeightMap) {
+    synchronized (_dynamicCapacityMetricsMap) {
+      if (_dynamicCapacityMetricsMap.keySet().equals(partitionWeightMap.keySet())) {
+        for (Map.Entry<String, Integer> entry : partitionWeightMap.entrySet()) {
+          _dynamicCapacityMetricsMap.get(entry.getKey()).updateValue((long) entry.getValue());
+        }
+        return;
+      }
+
+      // Capacity keys are changed, so capacity attribute map needs to be updated.
+      _dynamicCapacityMetricsMap.clear();
+      for (Map.Entry<String, Integer> entry : partitionWeightMap.entrySet()) {
+        String capacityKey = entry.getKey();
+        _dynamicCapacityMetricsMap.put(capacityKey,
+            new SimpleDynamicMetric<>(capacityKey + GAUGE_METRIC_SUFFIX, (long) entry.getValue()));
+      }
+    }
+
+    // Update all MBean attributes.
+    updateAttributesInfo(buildAttributeList(),
+        "Resource monitor for resource: " + getResourceName());
+  }
+
   public void setRebalanceState(RebalanceStatus state) {
     _rebalanceState.updateValue(state.name());
   }
@@ -428,4 +445,34 @@ public class ResourceMonitor extends DynamicMBeanProvider {
       _lastResetTime = System.currentTimeMillis();
     }
   }
+
+  private List<DynamicMetric<?, ?>> buildAttributeList() {
+    List<DynamicMetric<?, ?>> attributeList = Lists.newArrayList(
+        _numOfPartitions,
+        _numOfPartitionsInExternalView,
+        _numOfErrorPartitions,
+        _numNonTopStatePartitions,
+        _numLessMinActiveReplicaPartitions,
+        _numLessReplicaPartitions,
+        _numPendingRecoveryRebalancePartitions,
+        _numPendingLoadRebalancePartitions,
+        _numRecoveryRebalanceThrottledPartitions,
+        _numLoadRebalanceThrottledPartitions,
+        _externalViewIdealStateDiff,
+        _successfulTopStateHandoffDurationCounter,
+        _successTopStateHandoffCounter,
+        _failedTopStateHandoffCounter,
+        _maxSinglePartitionTopStateHandoffDuration,
+        _partitionTopStateHandoffDurationGauge,
+        _partitionTopStateHandoffHelixLatencyGauge,
+        _partitionTopStateNonGracefulHandoffDurationGauge,
+        _totalMessageReceived,
+        _numPendingStateTransitions,
+        _rebalanceState
+    );
+
+    attributeList.addAll(_dynamicCapacityMetricsMap.values());
+
+    return attributeList;
+  }
 }
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/dynamicMBeans/DynamicMBeanProvider.java b/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/dynamicMBeans/DynamicMBeanProvider.java
index 0ce0b44..407a714 100644
--- a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/dynamicMBeans/DynamicMBeanProvider.java
+++ b/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/dynamicMBeans/DynamicMBeanProvider.java
@@ -22,23 +22,19 @@ package org.apache.helix.monitoring.mbeans.dynamicMBeans;
 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.HashMap;
-import java.util.Iterator;
 import java.util.List;
 import java.util.Map;
 import javax.management.Attribute;
 import javax.management.AttributeList;
 import javax.management.AttributeNotFoundException;
 import javax.management.DynamicMBean;
-import javax.management.InvalidAttributeValueException;
 import javax.management.JMException;
 import javax.management.MBeanAttributeInfo;
 import javax.management.MBeanConstructorInfo;
-import javax.management.MBeanException;
 import javax.management.MBeanInfo;
 import javax.management.MBeanNotificationInfo;
 import javax.management.MBeanOperationInfo;
 import javax.management.ObjectName;
-import javax.management.ReflectionException;
 
 import org.apache.helix.SystemPropertyKeys;
 import org.apache.helix.monitoring.SensorNameProvider;
@@ -53,12 +49,12 @@ import org.slf4j.LoggerFactory;
 public abstract class DynamicMBeanProvider implements DynamicMBean, SensorNameProvider {
   protected final Logger _logger = LoggerFactory.getLogger(getClass());
   protected static final long DEFAULT_RESET_INTERVAL_MS = 60 * 60 * 1000; // Reset time every hour
-  private static String SENSOR_NAME_TAG = "SensorName";
-  private static String DEFAULT_DESCRIPTION =
+  private static final String SENSOR_NAME_TAG = "SensorName";
+  private static final String DEFAULT_DESCRIPTION =
       "Information on the management interface of the MBean";
 
   // Attribute name to the DynamicMetric object mapping
-  private final Map<String, DynamicMetric> _attributeMap = new HashMap<>();
+  private Map<String, DynamicMetric> _attributeMap = new HashMap<>();
   private ObjectName _objectName = null;
   private MBeanInfo _mBeanInfo;
 
@@ -88,7 +84,7 @@ public abstract class DynamicMBeanProvider implements DynamicMBean, SensorNamePr
           objectName.getCanonicalName());
       return false;
     }
-    updateAttributtInfos(dynamicMetrics, description);
+    updateAttributesInfo(dynamicMetrics, description);
     _objectName = MBeanRegistrar.register(this, objectName);
     return true;
   }
@@ -99,26 +95,30 @@ public abstract class DynamicMBeanProvider implements DynamicMBean, SensorNamePr
   }
 
   /**
-   * Update the Dynamic MBean provider with new metric list.
+   * Updates the Dynamic MBean provider with new metric list.
+   * If the pass-in metrics collection is empty, the original attributes will be removed.
+   *
    * @param description description of the MBean
-   * @param dynamicMetrics the DynamicMetrics
+   * @param dynamicMetrics the DynamicMetrics. Empty collection will remove the metric attributes.
    */
-  private void updateAttributtInfos(Collection<DynamicMetric<?, ?>> dynamicMetrics,
+  protected void updateAttributesInfo(Collection<DynamicMetric<?, ?>> dynamicMetrics,
       String description) {
-    _attributeMap.clear();
+    if (dynamicMetrics == null) {
+      _logger.warn("Cannot update attributes info because dynamicMetrics is null.");
+      return;
+    }
 
-    // get all attributes that can be emit by the dynamicMetrics.
     List<MBeanAttributeInfo> attributeInfoList = new ArrayList<>();
-    if (dynamicMetrics != null) {
-      for (DynamicMetric dynamicMetric : dynamicMetrics) {
-        Iterator<MBeanAttributeInfo> iter = dynamicMetric.getAttributeInfos().iterator();
-        while (iter.hasNext()) {
-          MBeanAttributeInfo attributeInfo = iter.next();
-          // Info list to create MBean info
-          attributeInfoList.add(attributeInfo);
-          // Attribute mapping for getting attribute value when getAttribute() is called
-          _attributeMap.put(attributeInfo.getName(), dynamicMetric);
-        }
+    // Use a new attribute map to avoid concurrency issue.
+    Map<String, DynamicMetric> newAttributeMap = new HashMap<>();
+
+    // Get all attributes that can be emitted by the dynamicMetrics.
+    for (DynamicMetric<?, ?> dynamicMetric : dynamicMetrics) {
+      for (MBeanAttributeInfo attributeInfo : dynamicMetric.getAttributeInfos()) {
+        // Info list to create MBean info
+        attributeInfoList.add(attributeInfo);
+        // Attribute mapping for getting attribute value when getAttribute() is called
+        newAttributeMap.put(attributeInfo.getName(), dynamicMetric);
       }
     }
 
@@ -130,17 +130,19 @@ public abstract class DynamicMBeanProvider implements DynamicMBean, SensorNamePr
         String.format("Default %s Constructor", getClass().getSimpleName()),
         getClass().getConstructors()[0]);
 
-    MBeanAttributeInfo[] attributeInfos = new MBeanAttributeInfo[attributeInfoList.size()];
-    attributeInfos = attributeInfoList.toArray(attributeInfos);
+    MBeanAttributeInfo[] attributesInfo = new MBeanAttributeInfo[attributeInfoList.size()];
+    attributesInfo = attributeInfoList.toArray(attributesInfo);
 
     if (description == null) {
       description = DEFAULT_DESCRIPTION;
     }
 
-    _mBeanInfo = new MBeanInfo(getClass().getName(), description, attributeInfos,
-        new MBeanConstructorInfo[] {
-            constructorInfo
-        }, new MBeanOperationInfo[0], new MBeanNotificationInfo[0]);
+    _mBeanInfo = new MBeanInfo(getClass().getName(), description, attributesInfo,
+        new MBeanConstructorInfo[]{constructorInfo}, new MBeanOperationInfo[0],
+        new MBeanNotificationInfo[0]);
+
+    // Update _attributeMap reference.
+    _attributeMap = newAttributeMap;
   }
 
   /**
@@ -158,17 +160,17 @@ public abstract class DynamicMBeanProvider implements DynamicMBean, SensorNamePr
   }
 
   @Override
-  public Object getAttribute(String attribute)
-      throws AttributeNotFoundException, MBeanException, ReflectionException {
+  public Object getAttribute(String attribute) throws AttributeNotFoundException {
     if (SENSOR_NAME_TAG.equals(attribute)) {
       return getSensorName();
     }
 
-    if (!_attributeMap.containsKey(attribute)) {
-      return null;
+    DynamicMetric metric = _attributeMap.get(attribute);
+    if (metric == null) {
+      throw new AttributeNotFoundException("Attribute[" + attribute + "] is not found.");
     }
 
-    return _attributeMap.get(attribute).getAttributeValue(attribute);
+    return metric.getAttributeValue(attribute);
   }
 
   @Override
@@ -178,7 +180,7 @@ public abstract class DynamicMBeanProvider implements DynamicMBean, SensorNamePr
       try {
         Object value = getAttribute(attributeName);
         attributeList.add(new Attribute(attributeName, value));
-      } catch (AttributeNotFoundException | MBeanException | ReflectionException ex) {
+      } catch (AttributeNotFoundException ex) {
         _logger.error("Failed to get attribute: " + attributeName, ex);
       }
     }
@@ -191,8 +193,7 @@ public abstract class DynamicMBeanProvider implements DynamicMBean, SensorNamePr
   }
 
   @Override
-  public void setAttribute(Attribute attribute) throws AttributeNotFoundException,
-      InvalidAttributeValueException, MBeanException, ReflectionException {
+  public void setAttribute(Attribute attribute) {
     // All MBeans are readonly
     return;
   }
@@ -204,8 +205,7 @@ public abstract class DynamicMBeanProvider implements DynamicMBean, SensorNamePr
   }
 
   @Override
-  public Object invoke(String actionName, Object[] params, String[] signature)
-      throws MBeanException, ReflectionException {
+  public Object invoke(String actionName, Object[] params, String[] signature) {
     // No operation supported
     return null;
   }
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/dynamicMBeans/SimpleDynamicMetric.java b/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/dynamicMBeans/SimpleDynamicMetric.java
index 1be6a21..2b0f1db 100644
--- a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/dynamicMBeans/SimpleDynamicMetric.java
+++ b/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/dynamicMBeans/SimpleDynamicMetric.java
@@ -25,7 +25,7 @@ package org.apache.helix.monitoring.mbeans.dynamicMBeans;
  * @param <T> the type of the metric value
  */
 public class SimpleDynamicMetric<T> extends DynamicMetric<T, T> {
-  private final String _metricName;
+  protected final String _metricName;
 
   /**
    * Instantiates a new Simple dynamic metric.
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/metrics/MetricCollector.java b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/MetricCollector.java
new file mode 100644
index 0000000..b08a840
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/MetricCollector.java
@@ -0,0 +1,99 @@
+package org.apache.helix.monitoring.metrics;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import javax.management.JMException;
+import javax.management.ObjectName;
+import org.apache.helix.HelixException;
+import org.apache.helix.monitoring.metrics.model.Metric;
+import org.apache.helix.monitoring.mbeans.dynamicMBeans.DynamicMBeanProvider;
+import org.apache.helix.monitoring.mbeans.dynamicMBeans.DynamicMetric;
+
+/**
+ * Collects and manages all metrics that implement the {@link Metric} interface.
+ */
+public abstract class MetricCollector extends DynamicMBeanProvider {
+  private static final String CLUSTER_NAME_KEY = "ClusterName";
+  private static final String ENTITY_NAME_KEY = "EntityName";
+  private final String _monitorDomainName;
+  private final String _clusterName;
+  private final String _entityName;
+  private Map<String, Metric> _metricMap;
+
+  public MetricCollector(String monitorDomainName, String clusterName, String entityName) {
+    _monitorDomainName = monitorDomainName;
+    _clusterName = clusterName;
+    _entityName = entityName;
+    _metricMap = new HashMap<>();
+  }
+
+  @Override
+  public DynamicMBeanProvider register() throws JMException {
+    // First cast all Metric objects to DynamicMetrics
+    Collection<DynamicMetric<?, ?>> dynamicMetrics = new HashSet<>();
+    _metricMap.values().forEach(metric -> dynamicMetrics.add(metric.getDynamicMetric()));
+
+    // Define MBeanName and ObjectName
+    // MBean name has two key-value pairs:
+    // ------ 1) ClusterName KV pair (first %s=%s)
+    // ------ 2) EntityName KV pair (second %s=%s)
+    String mbeanName =
+        String.format("%s=%s, %s=%s", CLUSTER_NAME_KEY, _clusterName, ENTITY_NAME_KEY, _entityName);
+
+    // ObjectName has one key-value pair:
+    // ------ 1) Monitor domain name KV pair where value is the MBean name
+    doRegister(dynamicMetrics,
+        new ObjectName(String.format("%s:%s", _monitorDomainName, mbeanName)));
+    return this;
+  }
+
+  @Override
+  public String getSensorName() {
+    return String.format("%s.%s.%s", _monitorDomainName, _clusterName,
+        _entityName);
+  }
+
+  void addMetric(Metric metric) {
+    if (metric instanceof DynamicMetric) {
+      _metricMap.putIfAbsent(metric.getMetricName(), metric);
+    } else {
+      throw new HelixException("MetricCollector only supports Metrics that are DynamicMetric!");
+    }
+  }
+
+  /**
+   * Returns a desired type of the metric.
+   * @param metricName
+   * @param metricClass Desired type
+   * @param <T> Casted result of the metric
+   * @return
+   */
+  public <T extends DynamicMetric> T getMetric(String metricName, Class<T> metricClass) {
+    return metricClass.cast(_metricMap.get(metricName));
+  }
+
+  public Map<String, Metric> getMetricMap() {
+    return _metricMap;
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/metrics/WagedRebalancerMetricCollector.java b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/WagedRebalancerMetricCollector.java
new file mode 100644
index 0000000..df8b60f
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/WagedRebalancerMetricCollector.java
@@ -0,0 +1,125 @@
+package org.apache.helix.monitoring.metrics;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import javax.management.JMException;
+
+import org.apache.helix.HelixException;
+import org.apache.helix.monitoring.mbeans.MonitorDomainNames;
+import org.apache.helix.monitoring.metrics.implementation.BaselineDivergenceGauge;
+import org.apache.helix.monitoring.metrics.implementation.RebalanceCounter;
+import org.apache.helix.monitoring.metrics.implementation.RebalanceFailureCount;
+import org.apache.helix.monitoring.metrics.implementation.RebalanceLatencyGauge;
+import org.apache.helix.monitoring.metrics.model.CountMetric;
+import org.apache.helix.monitoring.metrics.model.LatencyMetric;
+import org.apache.helix.monitoring.metrics.model.RatioMetric;
+
+
+public class WagedRebalancerMetricCollector extends MetricCollector {
+  private static final String WAGED_REBALANCER_ENTITY_NAME = "WagedRebalancer";
+
+  /**
+   * This enum class contains all metric names defined for WagedRebalancer. Note that all enums are
+   * in camel case for readability.
+   */
+  public enum WagedRebalancerMetricNames {
+    // Per-stage latency metrics
+    GlobalBaselineCalcLatencyGauge,
+    PartialRebalanceLatencyGauge,
+
+    // The following latency metrics are related to AssignmentMetadataStore
+    StateReadLatencyGauge,
+    StateWriteLatencyGauge,
+
+    /*
+     * Gauge of the difference (state and partition allocation) between the baseline and the best
+     * possible assignment.
+     */
+    BaselineDivergenceGauge,
+
+    // Count of any rebalance compute failure.
+    // Note the rebalancer may still be able to return the last known-good assignment on a rebalance
+    // compute failure. And this fallback logic won't impact this counting.
+    RebalanceFailureCounter,
+
+    // Waged rebalance counters.
+    GlobalBaselineCalcCounter,
+    PartialRebalanceCounter
+  }
+
+  public WagedRebalancerMetricCollector(String clusterName) {
+    super(MonitorDomainNames.Rebalancer.name(), clusterName, WAGED_REBALANCER_ENTITY_NAME);
+    createMetrics();
+    if (clusterName != null) {
+      try {
+        register();
+      } catch (JMException e) {
+        throw new HelixException("Failed to register MBean for the WagedRebalancerMetricCollector.",
+            e);
+      }
+    }
+  }
+
+  /**
+   * This constructor will create but will not register metrics. This constructor will be used in
+   * case of JMException so that the rebalancer could proceed without registering and emitting
+   * metrics.
+   */
+  public WagedRebalancerMetricCollector() {
+    this(null);
+  }
+
+  /**
+   * Creates and registers all metrics in MetricCollector for WagedRebalancer.
+   */
+  private void createMetrics() {
+    // Define all metrics
+    LatencyMetric globalBaselineCalcLatencyGauge =
+        new RebalanceLatencyGauge(WagedRebalancerMetricNames.GlobalBaselineCalcLatencyGauge.name(),
+            getResetIntervalInMs());
+    LatencyMetric partialRebalanceLatencyGauge =
+        new RebalanceLatencyGauge(WagedRebalancerMetricNames.PartialRebalanceLatencyGauge.name(),
+            getResetIntervalInMs());
+    LatencyMetric stateReadLatencyGauge =
+        new RebalanceLatencyGauge(WagedRebalancerMetricNames.StateReadLatencyGauge.name(),
+            getResetIntervalInMs());
+    LatencyMetric stateWriteLatencyGauge =
+        new RebalanceLatencyGauge(WagedRebalancerMetricNames.StateWriteLatencyGauge.name(),
+            getResetIntervalInMs());
+    RatioMetric baselineDivergenceGauge =
+        new BaselineDivergenceGauge(WagedRebalancerMetricNames.BaselineDivergenceGauge.name());
+    CountMetric calcFailureCount =
+        new RebalanceFailureCount(WagedRebalancerMetricNames.RebalanceFailureCounter.name());
+    CountMetric globalBaselineCalcCounter =
+        new RebalanceCounter(WagedRebalancerMetricNames.GlobalBaselineCalcCounter.name());
+    CountMetric partialRebalanceCounter =
+        new RebalanceCounter(WagedRebalancerMetricNames.PartialRebalanceCounter.name());
+
+    // Add metrics to WagedRebalancerMetricCollector
+    addMetric(globalBaselineCalcLatencyGauge);
+    addMetric(partialRebalanceLatencyGauge);
+    addMetric(stateReadLatencyGauge);
+    addMetric(stateWriteLatencyGauge);
+    addMetric(baselineDivergenceGauge);
+    addMetric(calcFailureCount);
+    addMetric(globalBaselineCalcCounter);
+    addMetric(partialRebalanceCounter);
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/metrics/implementation/BaselineDivergenceGauge.java b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/implementation/BaselineDivergenceGauge.java
new file mode 100644
index 0000000..8e6d49b
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/implementation/BaselineDivergenceGauge.java
@@ -0,0 +1,68 @@
+package org.apache.helix.monitoring.metrics.implementation;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.Map;
+import java.util.concurrent.ExecutorService;
+
+import org.apache.helix.controller.pipeline.AbstractBaseStage;
+import org.apache.helix.controller.rebalancer.util.ResourceUsageCalculator;
+import org.apache.helix.model.ResourceAssignment;
+import org.apache.helix.monitoring.metrics.model.RatioMetric;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+/**
+ * Gauge of the difference (state and partition allocation) between the baseline and the best
+ * possible assignment. Its value range is [0.0, 1.0].
+ */
+public class BaselineDivergenceGauge extends RatioMetric {
+  private static final Logger LOG = LoggerFactory.getLogger(BaselineDivergenceGauge.class);
+
+  /**
+   * Instantiates a new Simple dynamic metric.
+   * @param metricName   the metric name
+   */
+  public BaselineDivergenceGauge(String metricName) {
+    super(metricName, 0.0d);
+  }
+
+  /**
+   * Asynchronously measure and update metric value.
+   * @param threadPool an executor service to asynchronously run the task
+   * @param baseline baseline assignment
+   * @param bestPossibleAssignment best possible assignment
+   */
+  public void asyncMeasureAndUpdateValue(ExecutorService threadPool,
+      Map<String, ResourceAssignment> baseline,
+      Map<String, ResourceAssignment> bestPossibleAssignment) {
+    AbstractBaseStage.asyncExecute(threadPool, () -> {
+      try {
+        double baselineDivergence =
+            ResourceUsageCalculator.measureBaselineDivergence(baseline, bestPossibleAssignment);
+        updateValue(baselineDivergence);
+      } catch (Exception e) {
+        LOG.error("Failed to report BaselineDivergenceGauge metric.", e);
+      }
+      return null;
+    });
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/MonitorDomainNames.java b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/implementation/RebalanceCounter.java
similarity index 60%
copy from helix-core/src/main/java/org/apache/helix/monitoring/mbeans/MonitorDomainNames.java
copy to helix-core/src/main/java/org/apache/helix/monitoring/metrics/implementation/RebalanceCounter.java
index 73bf057..8ecce7c 100644
--- a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/MonitorDomainNames.java
+++ b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/implementation/RebalanceCounter.java
@@ -1,4 +1,4 @@
-package org.apache.helix.monitoring.mbeans;
+package org.apache.helix.monitoring.metrics.implementation;
 
 /*
  * Licensed to the Apache Software Foundation (ASF) under one
@@ -9,7 +9,7 @@ package org.apache.helix.monitoring.mbeans;
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
- *   http://www.apache.org/licenses/LICENSE-2.0
+ *     http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
@@ -19,14 +19,18 @@ package org.apache.helix.monitoring.mbeans;
  * under the License.
  */
 
+import org.apache.helix.monitoring.metrics.model.CountMetric;
+
+
 /**
- * This enum defines all of domain names used with various Helix monitor mbeans.
+ * To report counter type metrics related to rebalance. This monitor monotonically increases values.
  */
-public enum MonitorDomainNames {
-  ClusterStatus,
-  HelixZkClient,
-  HelixThreadPoolExecutor,
-  HelixCallback,
-  RoutingTableProvider,
-  CLMParticipantReport
+public class RebalanceCounter extends CountMetric {
+  /**
+   * Instantiates a new rebalance count metric.
+   * @param metricName the metric name
+   */
+  public RebalanceCounter(String metricName) {
+    super(metricName, 0L);
+  }
 }
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/MonitorDomainNames.java b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/implementation/RebalanceFailureCount.java
similarity index 68%
copy from helix-core/src/main/java/org/apache/helix/monitoring/mbeans/MonitorDomainNames.java
copy to helix-core/src/main/java/org/apache/helix/monitoring/metrics/implementation/RebalanceFailureCount.java
index 73bf057..fd335f2 100644
--- a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/MonitorDomainNames.java
+++ b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/implementation/RebalanceFailureCount.java
@@ -1,4 +1,4 @@
-package org.apache.helix.monitoring.mbeans;
+package org.apache.helix.monitoring.metrics.implementation;
 
 /*
  * Licensed to the Apache Software Foundation (ASF) under one
@@ -19,14 +19,16 @@ package org.apache.helix.monitoring.mbeans;
  * under the License.
  */
 
-/**
- * This enum defines all of domain names used with various Helix monitor mbeans.
- */
-public enum MonitorDomainNames {
-  ClusterStatus,
-  HelixZkClient,
-  HelixThreadPoolExecutor,
-  HelixCallback,
-  RoutingTableProvider,
-  CLMParticipantReport
+import org.apache.helix.monitoring.metrics.model.CountMetric;
+
+
+public class RebalanceFailureCount extends CountMetric {
+  /**
+   * Instantiates a new Simple dynamic metric.
+   *
+   * @param metricName the metric name
+   */
+  public RebalanceFailureCount(String metricName) {
+    super(metricName, 0L);
+  }
 }
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/metrics/implementation/RebalanceLatencyGauge.java b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/implementation/RebalanceLatencyGauge.java
new file mode 100644
index 0000000..b0c563b
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/implementation/RebalanceLatencyGauge.java
@@ -0,0 +1,89 @@
+package org.apache.helix.monitoring.metrics.implementation;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.concurrent.TimeUnit;
+
+import com.codahale.metrics.Histogram;
+import com.codahale.metrics.SlidingTimeWindowArrayReservoir;
+import org.apache.helix.monitoring.metrics.model.LatencyMetric;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class RebalanceLatencyGauge extends LatencyMetric {
+  private static final Logger LOG = LoggerFactory.getLogger(RebalanceLatencyGauge.class);
+  private static final long VALUE_NOT_SET = -1;
+  private long _lastEmittedMetricValue = VALUE_NOT_SET;
+  // Use threadlocal here so the start time can be updated and recorded in multi-threads.
+  private final ThreadLocal<Long> _startTime;
+
+  /**
+   * Instantiates a new Histogram dynamic metric.
+   * @param metricName the metric name
+   */
+  public RebalanceLatencyGauge(String metricName, long slidingTimeWindow) {
+    super(metricName, new Histogram(
+        new SlidingTimeWindowArrayReservoir(slidingTimeWindow, TimeUnit.MILLISECONDS)));
+    _metricName = metricName;
+    _startTime = ThreadLocal.withInitial(() -> VALUE_NOT_SET);
+  }
+
+  /**
+   * Calling this method multiple times would simply overwrite the previous state. This is because
+   * the rebalancer could fail at any point, and we want it to recover gracefully by resetting the
+   * internal state of this metric.
+   */
+  @Override
+  public void startMeasuringLatency() {
+    reset();
+    _startTime.set(System.currentTimeMillis());
+  }
+
+  @Override
+  public void endMeasuringLatency() {
+    if (_startTime.get() == VALUE_NOT_SET) {
+      LOG.error(
+          "Needs to call startMeasuringLatency first! Ignoring and resetting the metric. Metric name: {}",
+          _metricName);
+      return;
+    }
+    synchronized (this) {
+      _lastEmittedMetricValue = System.currentTimeMillis() - _startTime.get();
+      updateValue(_lastEmittedMetricValue);
+    }
+    reset();
+  }
+
+  /**
+   * Returns the most recently emitted metric value at the time of the call.
+   * @return
+   */
+  @Override
+  public Long getLastEmittedMetricValue() {
+    return _lastEmittedMetricValue;
+  }
+
+  /**
+   * Resets the internal state of this metric.
+   */
+  private void reset() {
+    _startTime.set(VALUE_NOT_SET);
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/metrics/model/CountMetric.java b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/model/CountMetric.java
new file mode 100644
index 0000000..c64f761
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/model/CountMetric.java
@@ -0,0 +1,69 @@
+package org.apache.helix.monitoring.metrics.model;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.monitoring.mbeans.dynamicMBeans.DynamicMetric;
+import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
+
+/**
+ * Represents a count metric and defines methods to help with calculation. A count metric gives a
+ * gauge value of a certain property.
+ */
+public abstract class CountMetric extends SimpleDynamicMetric<Long> implements Metric<Long> {
+
+  /**
+   * Instantiates a new count metric.
+   *
+   * @param metricName the metric name
+   * @param initCount the initial count
+   */
+  public CountMetric(String metricName, long initCount) {
+    super(metricName, initCount);
+  }
+
+  /**
+   * Increment the metric by the input count.
+   *
+   * @param count
+   */
+  public void increment(long count) {
+    updateValue(getValue() + count);
+  }
+
+  @Override
+  public String getMetricName() {
+    return _metricName;
+  }
+
+  @Override
+  public String toString() {
+    return String.format("Metric %s's count is %d", getMetricName(), getValue());
+  }
+
+  @Override
+  public Long getLastEmittedMetricValue() {
+    return getValue();
+  }
+
+  @Override
+  public DynamicMetric getDynamicMetric() {
+    return this;
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/metrics/model/LatencyMetric.java b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/model/LatencyMetric.java
new file mode 100644
index 0000000..733635e
--- /dev/null
+++ b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/model/LatencyMetric.java
@@ -0,0 +1,67 @@
+package org.apache.helix.monitoring.metrics.model;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import com.codahale.metrics.Histogram;
+import org.apache.helix.monitoring.mbeans.dynamicMBeans.DynamicMetric;
+import org.apache.helix.monitoring.mbeans.dynamicMBeans.HistogramDynamicMetric;
+
+/**
+ * Represents a latency metric and defines methods to help with calculation. A latency metric gives
+ * how long a particular stage in the logic took in milliseconds.
+ */
+public abstract class LatencyMetric extends HistogramDynamicMetric implements Metric<Long> {
+  protected String _metricName;
+
+  /**
+   * Instantiates a new Histogram dynamic metric.
+   * @param metricName the metric name
+   * @param metricObject the metric object
+   */
+  public LatencyMetric(String metricName, Histogram metricObject) {
+    super(metricName, metricObject);
+    _metricName = metricName;
+  }
+
+  /**
+   * Starts measuring the latency.
+   */
+  public abstract void startMeasuringLatency();
+
+  /**
+   * Ends measuring the latency.
+   */
+  public abstract void endMeasuringLatency();
+
+  @Override
+  public String getMetricName() {
+    return _metricName;
+  }
+
+  @Override
+  public String toString() {
+    return String.format("Metric %s's latency is %d", getMetricName(), getLastEmittedMetricValue());
+  }
+
+  @Override
+  public DynamicMetric getDynamicMetric() {
+    return this;
+  }
+}
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/InstanceMonitorMBean.java b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/model/Metric.java
similarity index 53%
rename from helix-core/src/main/java/org/apache/helix/monitoring/mbeans/InstanceMonitorMBean.java
rename to helix-core/src/main/java/org/apache/helix/monitoring/metrics/model/Metric.java
index a3221d8..be7ea80 100644
--- a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/InstanceMonitorMBean.java
+++ b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/model/Metric.java
@@ -1,4 +1,4 @@
-package org.apache.helix.monitoring.mbeans;
+package org.apache.helix.monitoring.metrics.model;
 
 /*
  * Licensed to the Apache Software Foundation (ASF) under one
@@ -19,33 +19,32 @@ package org.apache.helix.monitoring.mbeans;
  * under the License.
  */
 
-import org.apache.helix.monitoring.SensorNameProvider;
+import org.apache.helix.monitoring.mbeans.dynamicMBeans.DynamicMetric;
 
 /**
- * A basic bean describing the status of a single instance
+ * Defines a generic metric interface.
+ * @param <T> type of input value for the metric
  */
-public interface InstanceMonitorMBean extends SensorNameProvider {
+public interface Metric<T> {
+
   /**
-   * Check if this instance is live
-   * @return 1 if running, 0 otherwise
+   * Gets the name of the metric.
    */
-  public long getOnline();
+  String getMetricName();
 
   /**
-   * Check if this instance is enabled
-   * @return 1 if enabled, 0 if disabled
+   * Prints the metric along with its name.
    */
-  public long getEnabled();
+  String toString();
 
   /**
-   * Get total message received for this instances
-   * @return The total number of messages sent to this instance
+   * Returns the most recently emitted value for the metric at the time of the call.
+   * @return metric value
    */
-  public long getTotalMessageReceived();
+  T getLastEmittedMetricValue();
 
   /**
-   * Get the total disabled partitions number for this instance
-   * @return The total number of disabled partitions
+   * Returns the underlying DynamicMetric.
    */
-  public long getDisabledPartitions();
+  DynamicMetric getDynamicMetric();
 }
diff --git a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/dynamicMBeans/SimpleDynamicMetric.java b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/model/RatioMetric.java
similarity index 52%
copy from helix-core/src/main/java/org/apache/helix/monitoring/mbeans/dynamicMBeans/SimpleDynamicMetric.java
copy to helix-core/src/main/java/org/apache/helix/monitoring/metrics/model/RatioMetric.java
index 1be6a21..d321e51 100644
--- a/helix-core/src/main/java/org/apache/helix/monitoring/mbeans/dynamicMBeans/SimpleDynamicMetric.java
+++ b/helix-core/src/main/java/org/apache/helix/monitoring/metrics/model/RatioMetric.java
@@ -1,4 +1,4 @@
-package org.apache.helix.monitoring.mbeans.dynamicMBeans;
+package org.apache.helix.monitoring.metrics.model;
 
 /*
  * Licensed to the Apache Software Foundation (ASF) under one
@@ -9,7 +9,7 @@ package org.apache.helix.monitoring.mbeans.dynamicMBeans;
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
- *   http://www.apache.org/licenses/LICENSE-2.0
+ *     http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
@@ -19,42 +19,40 @@ package org.apache.helix.monitoring.mbeans.dynamicMBeans;
  * under the License.
  */
 
+import org.apache.helix.monitoring.mbeans.dynamicMBeans.DynamicMetric;
+import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
+
+
 /**
- * The dynamic metric that accept and emits same type of monitor data
- *
- * @param <T> the type of the metric value
+ * A gauge which defines the ratio of one value to another.
  */
-public class SimpleDynamicMetric<T> extends DynamicMetric<T, T> {
-  private final String _metricName;
-
+public abstract class RatioMetric extends SimpleDynamicMetric<Double> implements Metric<Double> {
   /**
    * Instantiates a new Simple dynamic metric.
-   *
-   * @param metricName   the metric name
+   *  @param metricName the metric name
    * @param metricObject the metric object
    */
-  public SimpleDynamicMetric(String metricName, T metricObject) {
+  public RatioMetric(String metricName, double metricObject) {
     super(metricName, metricObject);
-    _metricName = metricName;
   }
 
   @Override
-  public T getAttributeValue(String attributeName) {
-    if (!attributeName.equals(_metricName)) {
-      return null;
-    }
-    return getMetricObject();
+  public DynamicMetric getDynamicMetric() {
+    return this;
   }
 
-  /**
-   * @return current metric value
-   */
-  public T getValue() {
-    return getMetricObject();
+  @Override
+  public String getMetricName() {
+    return _metricName;
+  }
+
+  @Override
+  public Double getLastEmittedMetricValue() {
+    return getValue();
   }
 
   @Override
-  public void updateValue(T metricObject) {
-    setMetricObject(metricObject);
+  public String toString() {
+    return String.format("Metric name: %s, metric value: %f", getMetricName(), getValue());
   }
 }
diff --git a/helix-core/src/main/java/org/apache/helix/tools/ClusterVerifiers/BestPossibleExternalViewVerifier.java b/helix-core/src/main/java/org/apache/helix/tools/ClusterVerifiers/BestPossibleExternalViewVerifier.java
index d190976..66143fe 100644
--- a/helix-core/src/main/java/org/apache/helix/tools/ClusterVerifiers/BestPossibleExternalViewVerifier.java
+++ b/helix-core/src/main/java/org/apache/helix/tools/ClusterVerifiers/BestPossibleExternalViewVerifier.java
@@ -27,27 +27,37 @@ import java.util.HashSet;
 import java.util.Iterator;
 import java.util.List;
 import java.util.Map;
+import java.util.Optional;
 import java.util.Set;
 
 import org.apache.helix.HelixDefinedState;
+import org.apache.helix.HelixRebalanceException;
 import org.apache.helix.PropertyKey;
 import org.apache.helix.controller.common.PartitionStateMap;
 import org.apache.helix.controller.dataproviders.ResourceControllerDataProvider;
 import org.apache.helix.controller.pipeline.Stage;
 import org.apache.helix.controller.pipeline.StageContext;
+import org.apache.helix.controller.rebalancer.waged.AssignmentMetadataStore;
+import org.apache.helix.controller.rebalancer.waged.RebalanceAlgorithm;
+import org.apache.helix.controller.rebalancer.waged.WagedRebalancer;
+import org.apache.helix.controller.rebalancer.waged.constraints.ConstraintBasedAlgorithmFactory;
 import org.apache.helix.controller.stages.AttributeName;
 import org.apache.helix.controller.stages.BestPossibleStateCalcStage;
 import org.apache.helix.controller.stages.BestPossibleStateOutput;
 import org.apache.helix.controller.stages.ClusterEvent;
 import org.apache.helix.controller.stages.ClusterEventType;
 import org.apache.helix.controller.stages.CurrentStateComputationStage;
+import org.apache.helix.controller.stages.CurrentStateOutput;
 import org.apache.helix.controller.stages.ResourceComputationStage;
+import org.apache.helix.manager.zk.ZkBucketDataAccessor;
 import org.apache.helix.manager.zk.ZkClient;
 import org.apache.helix.manager.zk.client.HelixZkClient;
+import org.apache.helix.model.ClusterConfig;
 import org.apache.helix.model.ExternalView;
 import org.apache.helix.model.IdealState;
 import org.apache.helix.model.Partition;
 import org.apache.helix.model.Resource;
+import org.apache.helix.model.ResourceAssignment;
 import org.apache.helix.model.StateModelDefinition;
 import org.apache.helix.task.TaskConstants;
 import org.slf4j.Logger;
@@ -377,8 +387,16 @@ public class BestPossibleExternalViewVerifier extends ZkHelixClusterVerifier {
     }
 
     runStage(event, new CurrentStateComputationStage());
-    // TODO: be caution here, should be handled statelessly.
-    runStage(event, new BestPossibleStateCalcStage());
+    // Note the dryrunWagedRebalancer is just for one time usage
+    DryrunWagedRebalancer dryrunWagedRebalancer =
+        new DryrunWagedRebalancer(_zkClient.getServers(), cache.getClusterName(),
+            cache.getClusterConfig().getGlobalRebalancePreference());
+    event.addAttribute(AttributeName.STATEFUL_REBALANCER.name(), dryrunWagedRebalancer);
+    try {
+      runStage(event, new BestPossibleStateCalcStage());
+    } finally {
+      dryrunWagedRebalancer.close();
+    }
 
     BestPossibleStateOutput output = event.getAttribute(AttributeName.BEST_POSSIBLE_STATE.name());
     return output;
@@ -398,4 +416,55 @@ public class BestPossibleExternalViewVerifier extends ZkHelixClusterVerifier {
     return verifierName + "(" + _clusterName + "@" + _zkClient + "@resources["
        + (_resources != null ? Arrays.toString(_resources.toArray()) : "") + "])";
   }
+
+  /**
+   * A Dryrun WAGED rebalancer that only calculates the assignment based on the cluster status but
+   * never update the rebalancer assignment metadata.
+   * This rebalacer is used in the verifiers or tests.
+   */
+  private class DryrunWagedRebalancer extends WagedRebalancer {
+    DryrunWagedRebalancer(String metadataStoreAddrs, String clusterName,
+        Map<ClusterConfig.GlobalRebalancePreferenceKey, Integer> preferences) {
+      super(new ReadOnlyAssignmentMetadataStore(metadataStoreAddrs, clusterName),
+          ConstraintBasedAlgorithmFactory.getInstance(preferences), Optional.empty());
+    }
+
+    @Override
+    protected Map<String, ResourceAssignment> computeBestPossibleAssignment(
+        ResourceControllerDataProvider clusterData, Map<String, Resource> resourceMap,
+        Set<String> activeNodes, CurrentStateOutput currentStateOutput, RebalanceAlgorithm algorithm)
+        throws HelixRebalanceException {
+      return getBestPossibleAssignment(getAssignmentMetadataStore(), currentStateOutput,
+          resourceMap.keySet());
+    }
+  }
+
+  private class ReadOnlyAssignmentMetadataStore extends AssignmentMetadataStore {
+    ReadOnlyAssignmentMetadataStore(String metadataStoreAddrs, String clusterName) {
+      super(new ZkBucketDataAccessor(metadataStoreAddrs), clusterName);
+    }
+
+    @Override
+    public boolean persistBaseline(Map<String, ResourceAssignment> globalBaseline) {
+      // If baseline hasn't changed, skip writing to metadata store
+      if (compareAssignments(_globalBaseline, globalBaseline)) {
+        return false;
+      }
+      // Update the in-memory reference only
+      _globalBaseline = globalBaseline;
+      return true;
+    }
+
+    @Override
+    public boolean persistBestPossibleAssignment(
+        Map<String, ResourceAssignment> bestPossibleAssignment) {
+      // If bestPossibleAssignment hasn't changed, skip writing to metadata store
+      if (compareAssignments(_bestPossibleAssignment, bestPossibleAssignment)) {
+        return false;
+      }
+      // Update the in-memory reference only
+      _bestPossibleAssignment = bestPossibleAssignment;
+      return true;
+    }
+  }
 }
diff --git a/helix-core/src/main/java/org/apache/helix/tools/ClusterVerifiers/StrictMatchExternalViewVerifier.java b/helix-core/src/main/java/org/apache/helix/tools/ClusterVerifiers/StrictMatchExternalViewVerifier.java
index 13cc260..0b3c97e 100644
--- a/helix-core/src/main/java/org/apache/helix/tools/ClusterVerifiers/StrictMatchExternalViewVerifier.java
+++ b/helix-core/src/main/java/org/apache/helix/tools/ClusterVerifiers/StrictMatchExternalViewVerifier.java
@@ -23,7 +23,6 @@ import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Collections;
 import java.util.HashMap;
-import java.util.HashSet;
 import java.util.Iterator;
 import java.util.List;
 import java.util.Map;
@@ -56,19 +55,34 @@ public class StrictMatchExternalViewVerifier extends ZkHelixClusterVerifier {
 
   private final Set<String> _resources;
   private final Set<String> _expectLiveInstances;
+  private final boolean _isDeactivatedNodeAware;
 
+  @Deprecated
   public StrictMatchExternalViewVerifier(String zkAddr, String clusterName, Set<String> resources,
       Set<String> expectLiveInstances) {
+    this(zkAddr, clusterName, resources, expectLiveInstances, false);
+  }
+
+  @Deprecated
+  public StrictMatchExternalViewVerifier(HelixZkClient zkClient, String clusterName,
+      Set<String> resources, Set<String> expectLiveInstances) {
+    this(zkClient, clusterName, resources, expectLiveInstances, false);
+  }
+
+  private StrictMatchExternalViewVerifier(String zkAddr, String clusterName, Set<String> resources,
+      Set<String> expectLiveInstances, boolean isDeactivatedNodeAware) {
     super(zkAddr, clusterName);
     _resources = resources;
     _expectLiveInstances = expectLiveInstances;
+    _isDeactivatedNodeAware = isDeactivatedNodeAware;
   }
 
-  public StrictMatchExternalViewVerifier(HelixZkClient zkClient, String clusterName,
-      Set<String> resources, Set<String> expectLiveInstances) {
+  private StrictMatchExternalViewVerifier(HelixZkClient zkClient, String clusterName,
+      Set<String> resources, Set<String> expectLiveInstances, boolean isDeactivatedNodeAware) {
     super(zkClient, clusterName);
... 9853 lines suppressed ...


Mime
View raw message