lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kensho Hirasawa (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-10528) Use docvalue for range faceting in JSON facet API
Date Thu, 20 Apr 2017 00:49:04 GMT

     [ https://issues.apache.org/jira/browse/SOLR-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kensho Hirasawa updated SOLR-10528:
-----------------------------------
    Attachment: SOLR-10528.patch

Attached the first patch. This patch is for branch_6x.
I have implemented just only counting and so the first patch is incomplete.

In our environment (800,000 docs on 4 nodes, 1.5GB heap per node), latency decreases in all
situations.
{code}
# buckets in the whole range = 3 —> original: 22.2ms, patched: 20.5ms
# buckets in the whole range = 1000 —> original: 687ms, patched: 22.6ms 
# buckets in the whole range = 1million —> original: OutOfMemoryError, patched:23.3ms
{code}


However, there are many limitations for now.
- doesn't handle multiValued field
- doesn't handle TrieDate field
- doesn't handle subfacets
- doesn't handle substats
- doesn't handle include/others parameters
- doesn't handle mincount == 0

I am going to remove the above limitations one by one.

> Use docvalue for range faceting in JSON facet API
> -------------------------------------------------
>
>                 Key: SOLR-10528
>                 URL: https://issues.apache.org/jira/browse/SOLR-10528
>             Project: Solr
>          Issue Type: Improvement
>          Components: Facet Module
>    Affects Versions: 6.5
>            Reporter: Kensho Hirasawa
>            Priority: Minor
>         Attachments: SOLR-10528.patch
>
>
> Range faceting in JSON facet API has only one implementation. In the implementation,
all buckets are allocated and then range queries are executed for all the buckets. Therefore,
memory usage and computational cost of range facet can be very high if range is wide and gap
is narrow. 
> I think range faceting in JSON facet should have the implementation which uses DocValues
instead of inverted indices. By scanning DocValues, we can execute range facets much more
efficiently especially when the number of buckets is large.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message