directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <>
Subject Re: Bulk Loader : some ideas...
Date Sun, 17 Aug 2014 21:24:19 GMT
Le 17/08/14 22:05, Howard Chu a écrit :
> Emmanuel Lécharny wrote:
>> Le 17/08/14 17:07, Howard Chu a écrit :
>>> If we encounter an entry later in the LDIF that corresponds to one of
>>> these missing DNs, the search in the RDN index will just return the
>>> entryID we already assigned to it. We then remove the DN from the
>>> missing DN list. The result is that the DB tables and entryIDs are
>>> generated in DN order even if the entries aren't ordered in the LDIF.
>> The pb with this approach is that you lose the EntryUUID stored in the
>> LDIF file (typically when you try to bulk load an extract done from a
>> replica : you want to keep this information).
> So create a stub entry with a provisional entryUUID, and overwrite the
> stub entry with the real entryUUID if you encounter the real entry
> later. Still far cheaper than multiple passes thru the LDIF file.

It works in your case, not ours. Again, we need to order the master
table using the entry's UUID, as we also need to create the RDN index at
the same time. We can't pull one entry after the other and push them
into the master table, creating emtpy entries when we have missing
parents, it's just won't produce an ordered master table (the master
table is a Btree<UUID, Entry>). Obviously, if we had a side index for
UUID, pointing to offset to entries in the file, that would be a
different story (but we would still have to order the UUID index
seprarately, as a whole).

This is the reason who have two phases.

View raw message