directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <elecha...@gmail.com>
Subject Re: Bulk Loader : some ideas...
Date Sun, 17 Aug 2014 22:24:32 GMT
Le 17/08/14 23:44, Howard Chu a écrit :
> Emmanuel Lécharny wrote:
>> Le 17/08/14 22:05, Howard Chu a écrit :
>>> Emmanuel Lécharny wrote:
>>>> Le 17/08/14 17:07, Howard Chu a écrit :
>>>>> If we encounter an entry later in the LDIF that corresponds to one of
>>>>> these missing DNs, the search in the RDN index will just return the
>>>>> entryID we already assigned to it. We then remove the DN from the
>>>>> missing DN list. The result is that the DB tables and entryIDs are
>>>>> generated in DN order even if the entries aren't ordered in the LDIF.
>>>>
>>>> The pb with this approach is that you lose the EntryUUID stored in the
>>>> LDIF file (typically when you try to bulk load an extract done from a
>>>> replica : you want to keep this information).
>>>
>>> So create a stub entry with a provisional entryUUID, and overwrite the
>>> stub entry with the real entryUUID if you encounter the real entry
>>> later. Still far cheaper than multiple passes thru the LDIF file.
>>
>> It works in your case, not ours. Again, we need to order the master
>> table using the entry's UUID, as we also need to create the RDN index at
>> the same time. We can't pull one entry after the other and push them
>> into the master table, creating emtpy entries when we have missing
>> parents, it's just won't produce an ordered master table (the master
>> table is a Btree<UUID, Entry>).
>
> Then delete the stub entry and insert the new entry.

That's simply not working. If we have entry M with UUID u, and entry N
with UUID v, where N >> M, we may perfectly have v > u or v < u, but we
won't know until we have read N. N and M may have no hierarchical
relationship either (they are just in the same partition, sharing the
same root parent, but that's it). Here, we really have to grab all the
entryUUID, or to create one for each of the read entry, and order those
entryUUID, to be able to create the masterTable.
>
>> Obviously, if we had a side index for
>> UUID, pointing to offset to entries in the file, that would be a
>> different story (but we would still have to order the UUID index
>> seprarately, as a whole).
>>
>> This is the reason who have two phases.
>
> This sounds broken to me; that means if you try to load an LDIF from
> some other software that also includes entryUUIDs, but which are not
> generated in the order that you use, your master table will be in the
> wrong order.

We can deal with LDIF file, whatever the order the entries in it. The
master table will ends with the entry ordered by their EntryUUID (either
the one we had in the original file, or the ones we have generated).
There is nothing broken here.


Mime
View raw message