helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayak Borkar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HELIX-80) Helix support for splitting partitions for finer repartitioning
Date Thu, 04 Apr 2013 15:21:17 GMT
Vinayak Borkar created HELIX-80:

             Summary: Helix support for splitting partitions for finer repartitioning
                 Key: HELIX-80
                 URL: https://issues.apache.org/jira/browse/HELIX-80
             Project: Apache Helix
          Issue Type: Improvement
          Components: helix-core
    Affects Versions: 0.6.0-incubating
         Environment: All
            Reporter: Vinayak Borkar
            Priority: Minor

Apache Helix expects all partitions of resources be created upfront and deals with moving
partitions around based on instance availability. Currently, there is no automatic solution
for the case where a resource needs to be repartitioned into a larger number of finer partitions.
Here is an example of when systems might want to repartition resources:

Imagine I started with a cluster with 5 machines. Originally, say a resource was partitioned
into 20 partitions and Helix distributed 4 partitions to each machine. As time progresses
more and more data is loaded into this resource making the query response times unbearable.
So we add more machines into the cluster. The partitions evenly distribute onto the machines
until the cluster size reaches 20. Now when we grow the cluster size to say 50, no more redistribution
is possible unless we split the existing 20 partitions into at least 50 partitions. Currently
the application needs to use some technique to repartition existing partitions. It would be
nice if Helix supported this concept as a first-class citizen.

The converse of the repartitioning case is to merge partitions if too many partitions were
created in the first place, which also should be handled by Helix.

The complication that arises when changing the nature of partitions is when there are concurrent
inserts and the reorganization of partitions cannot be done by "stopping the world". Care
needs to be taken to make sure that concurrent inserts are not lost (or double inserted) when
reorgs are in progress.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message