lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-6077) Add a filter cache
Date Thu, 27 Nov 2014 09:08:12 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adrien Grand updated LUCENE-6077:
---------------------------------
    Attachment: LUCENE-6077.patch

Updated patch:
 - fixed LRUFilterCache.getChildResources to not make a copy of the cache (since Accountables.namedAccountables
already takes care of taking a snapshot)
 - replaced the heuristics based on the segment source in the diagnostics by a heuristic on
the segment size compared to the size of the top-level context. This should give better results
since merged segments are not necessarily large and flushed segments can be large if you have
large IW buffers.

I think it's ready?

> Add a filter cache
> ------------------
>
>                 Key: LUCENE-6077
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6077
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: LUCENE-6077.patch, LUCENE-6077.patch, LUCENE-6077.patch
>
>
> Lucene already has filter caching abilities through CachingWrapperFilter, but CachingWrapperFilter
requires you to know which filters you want to cache up-front.
> Caching filters is not trivial. If you cache too aggressively, then you slow things down
since you need to iterate over all documents that match the filter in order to load it into
an in-memory cacheable DocIdSet. On the other hand, if you don't cache at all, you are potentially
missing interesting speed-ups on frequently-used filters.
> Something that would be nice would be to have a generic filter cache that would track
usage for individual filters and make the decision to cache or not a filter on a given segments
based on usage statistics and various heuristics, such as:
>  - the overhead to cache the filter (for instance some filters produce DocIdSets that
are already cacheable)
>  - the cost to build the DocIdSet (the getDocIdSet method is very expensive on some filters
such as MultiTermQueryWrapperFilter that potentially need to merge lots of postings lists)
>  - the segment we are searching on (flush segments will likely be merged right away so
it's probably not worth building a cache on such segments)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message