jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-622) Improve QueryIndex interface
Date Wed, 08 May 2013 11:35:15 GMT

    [ https://issues.apache.org/jira/browse/OAK-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651804#comment-13651804

Thomas Mueller commented on OAK-622:

In a discussion with Alex Parvulescu and Tommaso Teofili, we decided to not change the interface
currently, but instead improve documentation for those methods that were not fully clear,
this is completed in revision 1480226.

We also discussed to add a marker interface FulltextQueryIndex, which (if implemented) flags
that the given index may support more than just the minimal fulltext query syntax. If this
index is used, then the query engine is supposed to *not* verify the fulltext constraint(s)
for the given selector.

We need to support for the "rep:excerpt()" feature of Jackrabbit 2.x. One idea is to add this
property to the filter (without actual restriction) if the query contains this column. That
way the index can detect that "rep:excerpt()" is needed. The excerpt is retrieved using the
regular way (Cursor.next() and then IndexRow.getValue("rep:excerpt")).

Later on, we still may want to change the query index interface, but just now it seems the
extensive changes originally proposed above are not yet needed.

> Improve QueryIndex interface
> ----------------------------
>                 Key: OAK-622
>                 URL: https://issues.apache.org/jira/browse/OAK-622
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: query
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Minor
> The current QueryIndex interface is quite simple, but doesn't address some of the required
features and more advanced optimizations that are possible:
> - For fulltext queries, it doesn't address the case where the index implementation has
a different understanding of the fulltext condition than what is described in the JCR spec
(the basic features).
> - For queries with "order by" it would be good to know if the index supports returning
the data in sorted order, and if yes, how much slower that would be (if it is slower). So
a index might have multiple strategies with different costs.
> - It's quite easy to misunderstand what getCost is supposed to do exactly. The new API
should have a clearer solution here.
> - Even if the query doesn't have "order by", the index might return the data in a sorted
way, which might help improving query performance (using a merge join)
> - The cost is currently a single value, it might be better to estimate the number of
nodes, the cost to run a query, and the cost per node. That way we could optimize to quickly
return the first few nodes (versus optimize for thoughput).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message