jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-3219) Lucene IndexPlanner should also account for number of property constraints evaluated while giving cost estimation
Date Wed, 12 Aug 2015 09:42:45 GMT

    [ https://issues.apache.org/jira/browse/OAK-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693228#comment-14693228

Chetan Mehrotra commented on OAK-3219:

[~catholicon] [~tmueller] Does the above description of the problem sounds correct (specially
regarding how property index do cost estimation)

> Lucene IndexPlanner should also account for number of property constraints evaluated
while giving cost estimation
> -----------------------------------------------------------------------------------------------------------------
>                 Key: OAK-3219
>                 URL: https://issues.apache.org/jira/browse/OAK-3219
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>            Priority: Minor
>             Fix For: 1.3.6
> Currently the cost returned by Lucene index is a function of number of indexed documents
present in the index. If the number of indexed entries are high then it might reduce chances
of this index getting selected if some property index also support of the property constraint.
> {noformat}
> /jcr:root/content/freestyle-cms/customers//element(*, cq:Page)[(jcr:content/@title =
'm' or jcr:like(jcr:content/@title, 'm%')) and jcr:content/@sling:resourceType = '/components/page/customer’]
> {noformat}
> Consider above query with following index definition
> * A property index on resourceType
> * A Lucene index for cq:Page with properties {{jcr:content/title}}, {{jcr:content/sling:resourceType}}
indexed and also path restriction evaluation enabled
> Now what the two indexes can help in
> # Property index
> ## Path restriction
> ## Property restriction on  {{sling:resourceType}}
> # Lucene index
> ## NodeType restriction
> ## Property restriction on  {{sling:resourceType}}
> ## Property restriction on  {{title}}
> ## Path restriction
> Now cost estimate currently works like this
> * Property index - {{f(indexedValueEstimate, estimateOfNodesUnderGivenPath)}}
> ** indexedValueEstimate - For 'sling:resourceType=foo' its the approximate count for
nodes having that as 'foo'
> ** estimateOfNodesUnderGivenPath - Its derived from an approximate estimation of nodes
present under given path
> * Lucene Index - {{f(totalIndexedEntries)}}
> As cost of Lucene is too simple it does not reflect the reality. Following 2 changes
can be done to make it better
> * Given that Lucene index can handle multiple constraints compared (4) to property index
(2), the cost estimate returned by it should also reflect this state. This can be done by
setting costPerEntry to 1/(no of property restriction evaluated)
> * Get the count for queried property value - This is similar to what PropertyIndex does
and assumes that Lucene can provide that information in O(1) cost. In case of multiple supported
property restriction this can be minima of all

This message was sent by Atlassian JIRA

View raw message