jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikas Saurabh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-8184) With statistical mode, facet count seems having higher error rate than expected
Date Tue, 02 Apr 2019 22:42:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-8184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808208#comment-16808208
] 

Vikas Saurabh commented on OAK-8184:
------------------------------------

bq. especially for small counts, which makes them obvious. Usually it is off by 1 but seeing
bigger like 20 or 30 as well.
Off by 1, imo, is quite expected with "statistical" facets - it's an estimation after all.
Also btw, while I agree clicking on link shows "2" while top level one showed "1" which looks
weird and kinda an error of 50% but I think it's exaggerated because we round down when attempting
to prorate insecure facet count with ratio of accessible items in sampled data (ie even 1.99
would be showing up as 1). Despite, the obvious weirdness in this case, I think we should
still round down.

That said, without OAK-8167 coming into play, I can't imagine a gap of 20-30 with "small counts".
Would it be possible to explain the scenario where you observed such large discrepancy.

Btw, there's another oak issue (OAK-8138) that we're working on - it'd do exact secure facet
counting if result set size (of the query) without accounting for ACL is less than sample
size. The assumption that this issue might be helpful is that I guess low facet counts should
be correlated with result set size (i.e. having facet count of 20 in a result set of say 20000
would likely be an edge case).

> With statistical mode, facet count seems having higher error rate than expected
> -------------------------------------------------------------------------------
>
>                 Key: OAK-8184
>                 URL: https://issues.apache.org/jira/browse/OAK-8184
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: query, search
>    Affects Versions: 1.6.16
>            Reporter: Kelvin Xu
>            Priority: Major
>         Attachments: image-2019-03-29-10-59-03-699.png, image-2019-03-29-10-59-17-163.png,
image-2019-03-29-11-00-11-094.png, image-2019-03-29-11-00-16-305.png
>
>
> We identified facet count drifts here and there especially for small counts, which makes
them obvious. Usually it is off by 1 but seeing bigger like 20 or 30 as well. Here’s one
example, consider this query run by a non-admin user,
> {code:java}
> 1_group.propertyvalues.extractFacet=true
> 1_group.propertyvalues.property=jcr:content/metadata/msft:associatedCampaign
> 2_group.0_path=/content/dam/microsoft/rad/
> 2_group.p.or=true
> orderby=jcr:content/jcr:lastModified
> orderby.sort=desc
> p.facetStrategy=oak
> p.facets=true
> p.guessTotal=250
> p.limit=-1
> p.offset=0
> property=jcr:content/metadata/msft:lifecycleStatus
> property.10_value=microsoft:studios/lifecycleStatus/Created
> property.1_value=Created
> property.2_value=Under Review
> property.3_value=Rejected
> property.4_value=Approved
> property.5_value=Published
> property.6_value=microsoft:search-marketing/lifecycleStatus/Approved
> property.7_value=microsoft:search-marketing/lifecycleStatus/Created
> property.8_value=microsoft:studios/lifecycleStatus/Approved
> property.9_value=microsoft:studios/lifecycleStatus/UnderReview
> type=dam:Asset
> {code}
> This is what returns, and notice one of the facet `/content/dam/microsoft/rad/public-campaign`
has 1 count.
> !image-2019-03-29-10-59-17-163.png!
> If we add this facet value as one of the query condition, like this
> {code:java}
> 5_group.1_propertyvalues.0_values=/content/dam/microsoft/rad/public-campaign
> 5_group.1_propertyvalues.extractFacet=true
> 5_group.1_propertyvalues.property=jcr:content/metadata/msft:associatedCampaign
> 2_group.0_path=/content/dam/microsoft/rad/
> 2_group.p.or=true
> orderby=jcr:content/jcr:lastModified
> orderby.sort=desc
> p.facetStrategy=oak
> p.facets=true
> p.guessTotal=250
> p.limit=-1
> p.offset=0
> property=jcr:content/metadata/msft:lifecycleStatus
> property.10_value=microsoft:studios/lifecycleStatus/Created
> property.1_value=Created
> property.2_value=Under Review
> property.3_value=Rejected
> property.4_value=Approved
> property.5_value=Published
> property.6_value=microsoft:search-marketing/lifecycleStatus/Approved
> property.7_value=microsoft:search-marketing/lifecycleStatus/Created
> property.8_value=microsoft:studios/lifecycleStatus/Approved
> property.9_value=microsoft:studios/lifecycleStatus/UnderReview
> type=dam:Asset
> {code}
> We got this, as you can see the actual count is 2.
> !image-2019-03-29-11-00-16-305.png!
> Is it an expected behavior? We are even seeing count being off on large result sets…this
makes user experience pretty bad and we thought the error rate would be much lower than that
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message