jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries
Date Fri, 05 Jan 2018 08:12:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312675#comment-16312675
] 

Thomas Mueller commented on OAK-7109:
-------------------------------------

I don't fully know how facets work. Could you help me a bit with this please. The query
{noformat}
select [rep:facet(simple/tags)] from [nt:base] as a 
where contains(a.[*], 'ipsum') 
and (isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2'))
{noformat}

converted to "regular SQL" would be this, right?
{noformat}
select [simple/tags], count(*)
from [nt:base] as a 
where contains(a.[*], 'ipsum') 
and (isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2'))
group by [simple/tags]
{noformat}

(I know the "group by" and "count" are not currently supported by Oak).
Or are there other aspects I missed? What do you mean with "scoring"?

If it's the same, then I guess we might want to support the "group by" and "count" features
in Oak, or add a custom logic to combine the results of 
{noformat}
select [rep:facet(...)] ... UNION select [rep:facet(...)] ...
{noformat}

> passing all constraints to lucene

What if Lucene doesn't index all the constraints?

> rep:facet returns wrong results for complex queries
> ---------------------------------------------------
>
>                 Key: OAK-7109
>                 URL: https://issues.apache.org/jira/browse/OAK-7109
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>    Affects Versions: 1.6.7
>            Reporter: Dirk Rudolph
>              Labels: facet
>         Attachments: facetsInMultipleRoots.patch, restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not containing
all original constraints. For example queries with multiple path restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and
(isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene even though
the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned facets are incorrect.
For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>    + tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual result set
is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the [disjunctive
normal form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex query and
executing a query for each of the disjunctive statements. As this is expanding exponentially
its only a theoretical solution, nothing for production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message