jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
Date Wed, 29 Jul 2015 06:35:04 GMT

    [ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645546#comment-14645546

Chetan Mehrotra commented on OAK-3001:

<<Thread view looks confusing so commenting directly>>

Checking the current implementation of query method it appears that method is accounting for
2 different types of properties

     * The indexed property can either be a {@link Long} value, in which case numeric
     * comparison applies, or a {@link Boolean} value, in which case "false" is mapped
     * to "0" and "true" is mapped to "1".

So depending on the value of indexedProperty
* _modified - MODIFIED >= startValue
* _deletedOnce - DELETEDONCE = 1
* _bin - HASBINARY = 1 (This one is done in RDB only and not in Mongo. So something to look

So looking at current usage it might be better to change the semantics here ^1^ and introduce
a notion of {{Operator}}. For now just limit it to >=, =, <= and let the caller provide
that value. Then implementation logic does not special case the treatment for Boolean and

^1^ - This assumes that for current requirement we do not have to provide both upper and lower
bound.  But then [~catholicon] patch in OAK-3070 might require specifying both. In such a
case then I would prefer to introduce a new method which takes both lower and upper bound
and leave current one as is

> Simplify JournalGarbageCollector using a dedicated timestamp property
> ---------------------------------------------------------------------
>                 Key: OAK-3001
>                 URL: https://issues.apache.org/jira/browse/OAK-3001
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core, mongomk
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>            Priority: Critical
>              Labels: scalability
>             Fix For: 1.2.4, 1.3.4
> This subtask is about spawning out a [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you record the
journal entry timestamp as an attribute in JournalEntry document and then you can delete all
the entries which are older than some time by a simple query. This would avoid fetching all
the entries to be deleted on the Oak side
> {quote}
> and a corresponding [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set of DocumentStore
API however, I believe this is not possible. But: [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
comes quite close: it would probably just require the opposite of that method too: 
> {code}
>     public <T extends Document> List<T> query(Collection<T> collection,
>                                               String fromKey,
>                                               String toKey,
>                                               String indexedProperty,
>                                               long endValue,
>                                               int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and an {{endValue}}
- with {{-1}} indicating when one of them is not used?
> {quote}

This message was sent by Atlassian JIRA

View raw message