jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-4638) Mostly async unique index (for UUIDs for example)
Date Fri, 05 Aug 2016 12:45:20 GMT

    [ https://issues.apache.org/jira/browse/OAK-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409400#comment-15409400

Chetan Mehrotra commented on OAK-4638:

+1 for the approach. This can be done in following way

# Have a Lucene index definition for uuid
# Have existing property index definition for uuid
# On Indexing side update property index as usual. But for the count call (which checks for
uniqueness) also check with Lucene index
# And then for normal query create a cursor on both and join them

Key aspects would be pruning old entries from property index. Simple approach can be involve
a simple traversal and then remove ones which are old. 

> Mostly async unique index (for UUIDs for example)
> -------------------------------------------------
>                 Key: OAK-4638
>                 URL: https://issues.apache.org/jira/browse/OAK-4638
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>            Reporter: Thomas Mueller
> The UUID index takes a lot of space. For the UUID index, we should consider using mainly
an async index. This is possible because there are two types of UUIDs: those generated in
Oak, which are sure to be unique (no need to check), and those set in the application code,
for example by importing packages. For older nodes, an async index is sufficient, and a synchronous
index is only (temporarily) needed for imported nodes. For UUIDs, we could also change the
generation algorithm if needed.
> It might be possible to use a similar pattern for regular unique indexes as well: only
keep the added entries of the last 24 hours (for example) in a property index, and then move
entries to an async index which needs less space. That would slow down adding entries, as
two indexes need to be checked.

This message was sent by Atlassian JIRA

View raw message