jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikas Saurabh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries
Date Fri, 05 Jan 2018 08:22:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312687#comment-16312687

Vikas Saurabh commented on OAK-7109:

(I know the "group by" and "count" are not currently supported by Oak).
Or are there other aspects I missed?
Indeed fundamentally that's what facets do -  provide usually few (not 'all' unlike group
by) properties and count according to how many documents match the query. Lucene's faceting
support also does ranges although we don't support that yet - e.g. I could facet of "jcr:created"
and the categories could turn out as "today", "within last week", etc (I'm not completely
sure about the API... I'm just trying to illustrate that faceted categories can potentially
be not-the-actually-stored-value).

bq. What do you mean with "scoring"?
The scoring part is entirely different issue unrelated to facets - e.g. we correctly won't
(can't??) order documents matching queries such as {{.... WHERE (CONTAINS(., 'text') AND foo1='bar')
OR (CONTAINS(., 'text' AND foo2='bar' AND foo3='bar')}} (foo=bar could be different fulltext
clause too... the issue is that we can't quite merge scores coming out of separate lucene
But, let's ignore the scoring for this issue.

bq. What if Lucene doesn't index all the constraints?
I have a very pessimistic view that we should fail such queries - I mean it's better to fail
and allow for right index def than giving incorrect results.

> rep:facet returns wrong results for complex queries
> ---------------------------------------------------
>                 Key: OAK-7109
>                 URL: https://issues.apache.org/jira/browse/OAK-7109
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>    Affects Versions: 1.6.7
>            Reporter: Dirk Rudolph
>              Labels: facet
>         Attachments: facetsInMultipleRoots.patch, restrictionPropagationTest.patch
> eComplex queries in that case are queries, which are passed to lucene not containing
all original constraints. For example queries with multiple path restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and
(isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene even though
the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned facets are incorrect.
For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>    + tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual result set
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the [disjunctive
normal form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex query and
executing a query for each of the disjunctive statements. As this is expanding exponentially
its only a theoretical solution, nothing for production. 

This message was sent by Atlassian JIRA

View raw message