lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5052) bitset codec for off heap filters
Date Tue, 25 Mar 2014 20:00:18 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947075#comment-13947075
] 

Robert Muir commented on LUCENE-5052:
-------------------------------------

{quote}
Let me ask one off-top question about switching to PulsingPF. I've heard that it's enabled
automatically for id-like field. Can you point on where it's done particularly? 
{quote}

See LUCENE-4498

if there is only one document in the postings list for a term, we just store that document
id instead of a pointer to a list ... of only one document.

The freq() for that one document is redundant as well: its the totalTermFreq() for the term,
so there is no frequency data recorded either. It still has a pointer for positions/payload/offsets
if you have that enabled: but in most cases with an ID-like field you do not.

> bitset codec for off heap filters
> ---------------------------------
>
>                 Key: LUCENE-5052
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5052
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/codecs
>            Reporter: Mikhail Khludnev
>              Labels: features
>             Fix For: 5.0
>
>         Attachments: LUCENE-5052.patch, bitsetcodec.zip, bitsetcodec.zip
>
>
> Colleagues,
> When we filter we don’t care any of scoring factors i.e. norms, positions, tf, but
it should be fast. The obvious way to handle this is to decode postings list and cache it
in heap (CachingWrappingFilter, Solr’s DocSet). Both of consuming a heap and decoding as
well are expensive. 
> Let’s write a posting list as a bitset, if df is greater than segment's maxdocs/8 
(what about skiplists? and overall performance?). 
> Beside of the codec implementation, the trickiest part to me is to design API for this.
How we can let the app know that a term query don’t need to be cached in heap, but can be
held as an mmaped bitset?
> WDYT?  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message