jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dirk Rudolph (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries
Date Wed, 03 Jan 2018 09:48:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309376#comment-16309376
] 

Dirk Rudolph commented on OAK-7109:
-----------------------------------

Hi [~catholicon] somehow the mail agent doesn't accept my mailings to oak-dev (I'm subscribed
and receiving mail but sending doesn't work ... anyway).

I checked the implementation of the optimisation and its not in dnf, as the optimisation is
not done on the negation normal form of the query (so not(a or b) are not properly expanded
to not(a) and not(b). For example (based on org.apache.jackrabbit.oak.query.SQL2OptimiseQueryTest#optimiseAndOrAnd()):

{code}
given ([a]=1 or [b]=2 or ([c]=3 and not([d]=4 or [e]=5))) and [x]=6 <=> ([a]=1 or [b]=2
or ([c]=3 and [d]<>4 and [e]<>5))) and [x]=6
expected ([a]=1 and [x]=6), ([b]=2 and [x]=6), ([c]=3 and [d]<>4 and [e]<>5 and
[x]=6)
actual ((c = 3) and (not ((d = 4) or (e = 5)))) and (x = 6), (b = 2) and (x = 6), (a = 1)
and (x = 6)
{code}

And even, assuming we would have the alternative being a DNF and facet counting across unions
would be supported merging the results from each of the queries given to lucene, the result
will still be wrong as each of the disjunctive statements will not be mutually exclusive (as
it would be with xor). So from my perspective there is not way to get proper facet counts
in that case from consumer side and only the option of 

b) filtering the documents based on the filter 
c) passing all constraints to lucene

would work. 

Regarding b) as from what I can see in the code base the nodes are not actually read but only
the permissions on their path are checked in [FilteredSortedSetDocValuesFacetCounts.java#L91|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L91]

I will check further why our specific query doesn't get entirely passed to lucene (or better
which constraints are not taken into account beside the path constraints). Anyway as a user
of the jcr api I would expect a RepositoryException (or any other) when I try to run a query
with facet extraction that no index can provide - similar to the exception I get when the
field I extract facets on is not stored. 


> rep:facet returns wrong results for complex queries
> ---------------------------------------------------
>
>                 Key: OAK-7109
>                 URL: https://issues.apache.org/jira/browse/OAK-7109
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>    Affects Versions: 1.6.7
>            Reporter: Dirk Rudolph
>              Labels: facet
>         Attachments: facetsInMultipleRoots.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not containing
all original constraints. For example queries with multiple path restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and
(isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene even though
the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned facets are incorrect.
For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>    + tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual result set
is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the [disjunctive
normal form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex query and
executing a query for each of the disjunctive statements. As this is expanding exponentially
its only a theoretical solution, nothing for production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message