directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Karasulu (JIRA)" <>
Subject [jira] Closed: (DIRSERVER-733) More efficient handling of duplicate keys in indices
Date Tue, 12 Sep 2006 14:53:24 GMT
     [ ]

Alex Karasulu closed DIRSERVER-733.

    Resolution: Fixed
      Assignee: Alex Karasulu

This has been fixed already.  Just adding it to JIRA for tracking.

> More efficient handling of duplicate keys in indices
> ----------------------------------------------------
>                 Key: DIRSERVER-733
>                 URL:
>             Project: Directory ApacheDS
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: pre-1.0, 1.0-RC1, 1.0-RC2, 1.0-RC3, 1.1.0, 1.0
>            Reporter: Alex Karasulu
>         Assigned To: Alex Karasulu
>             Fix For: 1.1.0, 1.0
> Here's the description of the problem and the solution that I posted to the mailing list:
> Hi again,
> I've completed my optimization on indices under high partition capacities.  The results
are in line and side by side to the old results.  For those not interested in the exact optimization
that took place skip down to the results.
> The problem:
> ------------
> Up to now duplicate keys (keys with many values) were stored using a TreeSet within the
B+Trees of indices.  Indices usually map some attribute value to an ID into the master table
for an entry.  The ID is a sequentially unique value assigned to the entry.
> So if 100 people have initials AA then there would be 100 values in the TreeSet for the
key 'AA' in the intials index.  This would be serialized and deserialized to and from disk.
 At these low numbers there's really very little impact.  However certain indices were seriously
impacted like the objectClass index or the hierarchy based system index.  Just imagine having
1 million entries of inetOrgPersons.  The objectClass index would then have 1 million values
in the TreeSet for the key 'inetOrgPerson'.  This would seriously hurt performance from both
a space and time perspective.  It would essentially obfuscate the benefits of using BTree's
in the first place with large numbers of entries.
> Solution:
> ---------
> What I did was add a configurable threshold parameter when instead of using a TreeSet
to store duplicates another B+Tree was created and used within the same index database file.
 Right now the default value for this which these tests were conducted with is 1000.  So after
having 1000 duplicate values the server switches from using in memory TreeSets to on disk
B+Trees for only storing values for that key.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:


View raw message