jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Parvulescu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-1236) Query: optimize for sling's i18n support
Date Fri, 06 Dec 2013 15:09:36 GMT

    [ https://issues.apache.org/jira/browse/OAK-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841334#comment-13841334

Alex Parvulescu commented on OAK-1236:

Funny enough, I think the 2 following statements have the same effect:

bq. would it be faster, to just search for all language roots and then traverse the subtree
instead of querying it?
bq. I wonder what would happen if there is no index on the mixin type sling:Message? Wouldn't
that make the query fast?

I've tested this (and fixed OAK-1269 in the process) and it looks like it would solve this
issue: removing the node type index for the sling:Message causes a traversal which has minimal
impact compared to the original issue.

On a more broader scope, I agree with Jukka that we should look into applying a similar optimization
like the jackrabbit case: buffer the left side results and push the intermediate values on
the right side of the join as a filter, but this could be tracked in a dedicated issue.

This issue is now a matter of index config which is outside the indexing code, so I will mark
is as resolved soon if nobody objects.

> Query: optimize for sling's i18n support
> ----------------------------------------
>                 Key: OAK-1236
>                 URL: https://issues.apache.org/jira/browse/OAK-1236
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>            Reporter: Alex Parvulescu
>            Assignee: Alex Parvulescu
> There are some performance issues with sling's internationalization support query [0].
> The query for a specific locale looks like the following
> {noformat}
> //element(*,mix:language)[@jcr:language='en']//element(*,sling:Message)[@sling:message]/(@sling:key|@sling:message)
> {noformat}
> This turns into a join and it looks like it cannot properly leverage the index on the
left side to filter out content on the right side of the join.
> I'm going to use a standard CQ setup for the following analysis.
> The left side of the join is quite efficient with a property index
> {noformat}
> //element(*,mix:language)[@jcr:language='en']
> /libs/foundation/components/search/i18n/en
> /libs/foundation/components/mobilefooter/i18n/en
> /libs/commerce/components/search/i18n/en
> /libs/cq/searchpromote/components/pagination/i18n/en
> {noformat}
> fast query, so far so good.
> Now the trouble begins running the right side
> {noformat}
> //element(*,sling:Message)[@sling:message]/(@sling:key|@sling:message)
> {noformat}
> As far as I see the biggest issue here is that the second query doesn't leverage the
left side join info. This affects the overall query time twice
>  - first it doesn't know that we're only looking for 'en' so the query will traverse
all the existing translations in all the languages (goes up to 91k rows). So it will fetch
91k rows each time, filtering out for english at a later phase
>  - second it appears to run the query for each of the left side hit, in our case 4 times
making the first issue 4 times worse.
> [0] http://sling.apache.org/site/internationalization-support.html

This message was sent by Atlassian JIRA

View raw message