jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Commented] (OAK-1907) Better cost estimates for traversal, property, and ordered indexes
Date Tue, 02 Dec 2014 09:19:12 GMT

    [ https://issues.apache.org/jira/browse/OAK-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231199#comment-14231199
] 

Michael Dürig commented on OAK-1907:
------------------------------------

[~tmueller] the changes from http://svn.apache.org/r1642683 break the test expectations of
{{ObservationRefreshTest#observation}} thus causing it to fail. That test doesn't expect events
for 

{code}
/oak:index/counter/reindexCount
/oak:index/counter/reindex
{code}

I updated the test expectations at http://svn.apache.org/r1642825. However, such additional
events might as well also affect existing applications so I'm nor sure whether this is really
what we want. 



> Better cost estimates for traversal, property, and ordered indexes
> ------------------------------------------------------------------
>
>                 Key: OAK-1907
>                 URL: https://issues.apache.org/jira/browse/OAK-1907
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.0, 1.0.1, 1.0.2
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>             Fix For: 1.2
>
>         Attachments: ApproxCount.java, OAK-1907.diff
>
>
> Currently, cost estimates of traversal, property index, and ordered index don't take
the number of nodes into account, if there are more than about 100 nodes. This is problematic
because in many cases, the wrong index is used (because of incorrect cost estimate).
> To get a better estimate, a very rough estimate on the number of child nodes below a
given path is needed. 
> One idea is: when adding a node, if Math.random() < 0.00001, add a hidden, randomly
named property (for example called ":count-xyz" where xyz is a uuid, value 100'000) to the
parents of that node, so that we know there are probably more than 100'000 nodes below a given
path. When removing a node, with the same algorithm add a hidden property (":count-xyz", value
-100'000). That should result in a slowdown of less than 0.01%, but should allow us much better
cost estimates. Those properties could be consolidated asynchronously if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message